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DETAILED ACTION 
Claim Objections 

1 . Claims 1 5 and 41 are objected to because of the following infomnalities: 

Claim 15 recites the limitations "constructing a second histogram for each of the 
selected features using the second measurement infonnation; determining a first 
(assumed for examining purpose: second) area encompassed by each of the first 
(assumed for examining purpose: second) histograms; encoding the first (assumed for 
examining purpose: second) areas of the first (assumed for examining purpose: 
second) histograms in metadata elements of a first (assumed for examining purpose: 
second) hypertext markup language (HTML) document; and". 

Claim 41 "means for constructing a second histogram for each of the selected 
features using the second measurement information; means for determining a first 
(assumed for examining purpose: second) area encompassed by each of the first 
(assumed for examining purpose: second) histograms; means for encoding the first 
(assumed for examining purpose: second) areas of the first (assumed for examining 
purpose: second) histograms in metadata elements of a first (assumed for examining 
purpose: second) hypertext marl<up language (HTML) document; and". 

Applicant in claims 15 and 41 recites the measurement of selected features of 
first object and second object. In these claims, first area under first histograms are 
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associated with first object and further linked to first HTML document and again first 
area under first histograms are associated to second object and further linked to first 
HTML document. These limitations are inconsistent with the limitations: 

1. Claim 15 - "constructing a second histogram for each of the selected features 
using the second measurement information". 

2. Claim 41 - "means for constructing a second histogram for each of the 
selected features using the second measurement information". 

Therefore, it is assumed by the examiner for the examining purposes that a 
second area under second histograms is calculated for second object and are further 
linked to second HTML document. 

Appropriate correction is required. 

Claim Rejections - 35 USC § 103 

2. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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3. Claims 1. 2, 27. 28, 40 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Abdel-Mottaleb et al., U.S. Patent No. 5,915,038 in view of 
Tsymbalenko et al., Jan. 9, 2001, "Using HTML Metadata to find relevant images on the 
world wide web". 

Claim 1 recites "retrieving a first object, wherein the first object comprises a first 
measurement information encoded In metadata elements of a first hypertext mark up 
language (HTML) document". Abdel-Mattaleb discloses computing an index key 
(metadata) for a query (first) image where as the query (first) image is an image 
identified (retrieved) by the user for use in retrieving other similar images (column 3, 
lines 40-43; column 4, lines 38-43). Abdel-Mettaleb further discloses the extraction of 
index keys is the extraction of metadata, where extracting metadata is the measurement 
of the features of the still images (column 7, lines 17-23). 

Claim 1 further recites "comparing the first object with a second object, wherein 
the second object comprises a second measurement information encoded in metadata 
elements of a second hypertext markup language (HTML) document". Abdel-Mettaleb 
discloses the comparison of the index key (metadata information) of the query (first) 
image with each index key of the corresponding image (second) being searched 
(column 3, lines 45-50). 

Claim 1 recites "retrieving 'the second object in response to the difference 
between the first measurement information of the first HTML document and the second 
measurement infonnation of the second HTML document being less than or equal to a 
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threshold difference value". Abdel-Mettaleb discloses the selection of the second image 
with respect to the threshold difference value (column 3, lines 50-60). 

Abdel-Mettaleb further discloses that the still images may be stored locally or 
remotely, such as a remote Web site on the Internet and archived using the index key 
extraction and archival process (column 8, lines 13-16). Abdel-Mettaleb does not 
explicitly teach the encoding of metadata infomriation in metadata elements of a HTML 
document. 

However, Tsymbalenko discloses the image retrieval system where the images 
are accessed through HTML documents and that the bulk of the content of HTML 
documents is textual whereas the textual content and the structure of HTML documents 
are considered to be "metadata" describing the images and this metadata is used to 
detemiine which images may be relevant to a query (page 3, paragraph 2). 
Tsymbalenko further discloses the search strategy where the result showed a HTML 
document with links to other HTML documents that best matched the query (page 3, 
paragraph 5) and every image is linked to each HTML document (page 3, para. 6, lines 
2-3). 

Therefore, it would had been obvious to one having ordinary skill in the art at the 
time of the invention was made to include the teachings by Tsymbalenko in the 
invention of Abdel-Mettaleb. One would have been motivated to use the concept of 
image metadata encoding in the HTML documents by Tsymbalenko in the invention of 
Abdel-Mettaleb because both the references are directed to image search and retrieval. 
Abdel-Mettaleb discloses that the still images may be stored locally or remotely, such as 
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a remote web site on the Internet and Tsymbalenko further provides the image search 
and retrieval using the HTML documents where each image index key (metadata) 
extracted from Abdel-Mettaleb's invention can be encoded in HTML document as a 
textual Information and the advantage of using metadata textual information would lead 
in saving memory and bandwidth on the image retrieval system. 

Claim 2 has been analyzed and rejected as per claim 1. 

Claim 27 additionally recites the system that retrieves images by content 
measure metadata encoding. Abdel-Mettaleb discloses the use of computer system for 
his invention (column 6, lines 24-37). All other limitations In claim 27 had been analyzed 
and rejected as per claim 1 . 

Claim 28 has been analyzed and rejected as per claims 1 and 27. 

Claim 40 has been analyzed and rejected as per claim 27. 

4. Claims 3, 9-12, 14, 29, 35-38 and 40 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Eakins et al., Jan. 1999, "Content-based image retrieval: a 
report to the JISC technology applications programme", in view of Opittek et al., U.S. 
Patent No. 3,979,555, and further in view of Tsymbalenko et al., Jan. 9, 2001, "Using 
HTML Metadata to find relevant images on the world wide web". 
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Claim 3 recites "A method of encoding images by content measure metadata 
encoding, comprising: measuring selected features of an object to form measurement 
information; constructing a histogram for each of the selected features using the 
measurement information". Eakins discloses content-based image retrieval (CBIR) 
methods (page 18) and systems (page 23) where stored images are retrieved from a 
collection by comparing features automatically extracted from the images themselves 
where the commonest features used are mathematical measures of color, texture or 
shape (page 18, para. 3). Eakins further discloses measuring the proportion of pixels of 
each color within the image and a color histogram is computed to show this 
measurement (page 18, para. 4, lines 2-4). 

Claim 3 further recites "determining an area encompassed by each of the 
histograms". Eakins does not teach of calculating an area encompassed by histogram 
for each of the selected features. However, Opittek discloses calculating a total area 
under histogram to compute the total distribution of gray scale intensity levels in the 
image (column 3, lines 13-20). Opittek discloses a histogram in figure 1-4 where x-axis 
represents internal division of the selected feature (intensity of a color) and y-axis 
represents a frequency of occurrence of intensity level. 

Therefore, it would had been obvious to one having ordinary skill in the art at the 
time of the invention was made to use the concept of calculating area under the 
histogram by Opittek in the invention of Eakins. One would have been motivated to use 
the concept of calculating area under the histogram by Opittek in the invention of Eakins 
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because Eakins computes a color histogram for each color within the image and 
Opittek's further shows the histogram in standard x-y coordinate format with x-axis 
representing internal division of the selected feature (intensity of a color) and y-axis 
representing a frequency of occurrence of intensity level in figure 1 and then calculates 
a total area under this histogram to compute the total distribution of color intensity 
levels in the image and calculating area will provide a mathematical value of the 
distribution of the color intensity levels. 

Claim 3 further recites "encoding areas of the histograms in metadata elements 
of a hypertext markup language (HTML) document". Eakins discloses the encoding of 
parameters such as the color space on which the histogram is based in metadata 
elements of a XML (Extensible Markup Language) document. Eakins does not teach 
encoding in metadata elements of a HTML document. Encoding in metadata of HTML is 
well known in the art and this is supported by Tsymbalenko et al., which further 
discloses that each image is associated to each HTML document as explained n claim 
1. 

Claim 9 recites "A method of claim 4, wherein measuring selected features 
further comprises measuring a geometric feature of the object". The features such as 
shape, size, distance, rotation angle, etc., are standard types of geometric features 
depending on the object type Therefore claim 9 has been analyzed and rejected as per 
claim 3. 
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Claim 10 and 11 had been analyzed and rejected as per claim 9. 

Claim 12 has been analyzed and rejected as per claim 3. 

Claim 14 has been analyzed and rejected as per claim 3 and in further in view of 
claim 1 . 

Claim 29 recites a system that perfomis the method as recited in claim 3. Eakins 
further discloses many content-based image retrieval systems and these systems use 
the same method as recited in claim 3 (page 23; page 18, para. 3, lines 3-4). Therefore, 
claim 29 has been analyzed and rejected as per claim 3. 

Claim 35 has been analyzed and rejected as per claim 29 and further in y'lew of 
claims 3 and 9. 

Claims 36 and 37 had been analyzed and rejected as per claim 35. 

Claim 38 has been analyzed and rejected as per claim 29 and further in view of 
claim 3 and 12 

Claim 40 has been analyzed and rejected as per claim 29 and in further view of 
claims 3,14 and 1. 
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5. Claims 4-8 and 30-34 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Eakins et al., Jan. 1999, "Content-based image retrieval: a report to the JISC 
technology applications programme", in view of Opittek et al., U.S. Patent No. 
3,979,555, and further in view of Tsymbalenko et al., Jan. 9, 2001, "Using HTML 
Metadata to find relevant images on the world wide web", and further in view of Baxes, 
1994, Book Publication, "Digital image processing: principles and applications". 

Claim 4 recites "A method of claim 3, wherein measuring selected features 
further comprises measuring an intensity of a preselected color of the object". As 
explained in rejection of claim 3, the combined invention of Eakins and Opittek 
determines intensity levels of each color based on the histograms. Therefore claim 4 
has been analyzed and rejected as per claim 3. 

It is well known in the art that each color can be red, green or blue as in RGB 
space and each color can also be a gray color in a gray-scale image. This well known 
art is further support by Baxes, who in his book printed in 1994, discloses that three 
histograms can be generated in an RGB space for each color component (page 63, 
para 3; page 64, para 1 and figures 3.20(a), 3.20(b) and 3.20(c). Therefore, it would had 
been obvious to one having ordinary skill in the art at the time of the invention was 
made to use the concept of constructing histograms for a preselected color as disclosed 
by Baxes. One would have been motivated to use the concept of Baxes because each 
histogram can help to determine the brightness distributions, contrast, and dynamic 
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ranges of the individual components where brightness is the measured intensity of the 
pixels In the image (page 3). 

Claims 5, 6, 7, and 8 had been analyzed and rejected as per claim 4. 

Claims 30, 31, 32, 33, 34 had been analyzed and rejected as per claim 29 and in 
further view of claims 4, 5, 6, 7 and 8. 

6. Claims 13 and 39 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Eakins et al., Jan. 1999, "Content-based image retrieval: a report to the JISC 
technology applications programme", in view of Opittek et al., U.S. Patent No. 
3,979,555. and further in view of Tsymbalenko et al., Jan. 9. 2001, "Using HTML 
Metadata to find relevant images on the world wide web" and further in view of Rorvig, 
"An experimental approach for the content-based image analysis: an open source 
agenda for research". 

Claim 13 recites "a method of claim 3, further comprising converting the area 
under the histogram to a Lorenz Information Measure (LIM)". Eakins, Opittek and 
Tsymbalenko do not teach converting the area under the histogram to a Lorenz 
Information Measure (LIM). However, Rorvig teaches converting the area under the 
histogram to a Lorenz Information Measure (page 14 and 15). Using LIM to convert the 
area under the histogram is well known in the field of distributional analysis, such as 
economics. 
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Therefore, it would had been obvious to one having ordinary skill in the art at the 
time of the invention was made to use the invention of Rorvig in the combined invention 
of Eakins, Opittek and Tsymbalenko. One would have been motivated to use Rorvig's 
invention in the combined invention of Eakins, Opittek and Tsymbalenko because using 
Rorvig's concept of using LIM on the area of compassed by the histogram will convert 
histogram values to a single value and then this single value can be encoded as 
metadata name tags in HTML as per Tsymbalenko rather than attempting to use all of 
the individual data in the histogram. 

Claim 39 has been analyzed and rejected as per claim 29 and in further view of 

13. 

7. Claims 15, 21-24, 26, 41, 47-50 and 52 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Abdel-Mottaleb et al., U.S. Patent No. 5,915,038, in view of 
Eakins et al., Jan. 1999, "Content-based image retrieval: a report to the JISC 
technology applications programme", and further in view of Opittek et al., U.S. Patent 
No. 3,979,555, and further in view of Tsymbalenko et al., Jan. 9. 2001, "Using HTML 
Metadata to find relevant images on the world wide web". 

Claim 15 recites a method of retrieving images by content measure metadata 
encoding. As explained in the rejection of claim 1, Abdel-Mottaleb, extract (measures) 
an index key (metadata) for the query (first) image and then generates the key index for 
each of the images to be searched, where extracting metadata is the measurement of 
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the features of the still images, and further discloses the selection of the second image 
with respect to the threshold difference value. Abdel-Mettaleb does not go into the 
details of constructing histogram for each feature measured, then generating area under 
histogram and further encoding the areas under histogram in metadata elements of a 
HTML document. 

However, the invention of Eakins combined with Opittek and Tsymbalenko, as 
explained in the rejection of claim 3, provides the detailed steps of the image analysis 
required on the query (first) image and all other images to be searched and further 
provides the concept of encoding of metadata in metadata elements of a HTML 
document. Therefore, it would had been obvious to one having ordinary skill in the art at 
the time of the invention was made to use the combined invention of Eakins, Opittek 
and Tsymbalenko in the invention of Abdel-Mettaleb. One would have been motivated 
to use the combined invention of Eakins, Opittek and Tsymbalenko in the invention of 
Abdel-Mettaleb because Abdel-Mettaleb, Eakins, and Tsymbalenko are directed to the 
methods and systems of image search and retrieval and Eakins further go more in detail 
of using content features in the image search and retrieval to optimize the search 
techniques and have better image similarity results and Opittek further provides the 
concept of image analysis on the selected image features such as obtaining a total 
mathematical measure (metadata) of color intensity distribution and Tsymbalenko 
further supports Abdel-Mettaleb method of searching the images on web by encoding 
the metadata of the images in the metadata elements of the HTML document as 
explained in the rejection of claim 1 . 



Application/Control Number: 10/087,347 Page 14 

Art Unit: 2625 

Claim 21 has been analyzed and rejected as per claim 15 and further in view of 
claims 3 and 9. 

Claim 22 and 23 had been analyzed and rejected as per claim 21. 

Claim 24 has been analyzed and rejected as per claim 15 and further in view of 
claims 3 and 14. 

Claim 26 has been analyzed and rejected as per claim 15 and further in view of 
claim 14, 3 and 1. 

Claim 41 recites a system that perfomis the method recited in claim 15. Claim 41 
has been analyzed and rejected as per claim 27. 

Claim 47 has been analyzed and rejected as per claim 41 and in further view of 
claim 21. 

Claims 48 and 49 has been analyzed and rejected as per claim 47. 

Claim 50 has been analyzed and rejected as per claim 41 and further in view of 
claim 24. 
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Claim 52 has been analyzed and rejected as per claim 41 and further in view of 
claim 26. 

8. Claims 16-20 and 42-46 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Abdel-Mottaleb et al., U.S. Patent No. 5.915.038, in view of Eakins et 
al.. Jan. 1999, "Content-based image retrieval: a report to the JISC technology 
applications programme", and further in view of Opittek et al., U.S. Patent No. 
3,979,555. and further in view of Tsymbalenko et al., Jan. 9, 2001. "Using HTML 
Metadata to find relevant images on the world wide web", and further in view of Baxes. 
1994. Book Publication. "Digital image processing: principles and applications". 

Claim 16 recites "A method of claim 15. wherein measuring selected features 
further comprises measuring an intensity of a preselected color of the object". Abdel- 
Mottaleb does not teach of measuring intensity of a preselected color of the object but 
measuring of intensity of color with motivation has been explained in the rejection of 
claim. The claim 16 has been analyzed and rejected as per claim 15 and in further view 
of claim 4. 

Claims 17-20 had been analyzed and rejected as per claims 16 and 15 and 
further in view of claims 5-8. 
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Claims 42-46 had been analyzed and rejected as per claims as per claim 41 and 
in further view of claims 16-20. 

9. Claims 25 and 51 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Abdel-Mottaleb et al., U.S. Patent No. 5.915,038, in view of Eakins et al.. Jan. 
1999, "Content-based image retrieval: a report to the JISC technology applications 
programme", and further in view of Opittek et al.. U.S. Patent No. 3,979,555. and further 
in view of Tsymbalenko et al., Jan. 9, 2001, "Using HTML Metadata to find relevant 
images on the world wide web" and further in view of Rorvig, "An experimental 
approach for the content-based image analysis: an open source agenda for research". 

Claim 25 recites "a method of claim 15, further comprising converting the area 
under the histogram to a Lorenz Information Measure (LIM)". Abdel-Mottaleb, Eakins, 
Opittek and Tsymbalenko do not teach converting the area under the histogram to a 
Lorenz Information Measure (LIM). However. Rorvig teaches converting the area under 
the histogram to a Lorenz Information Measure (page 14 and 15) as explained in the 
claim 13 with motivation. Therefore, claim 25 has been analyzed and rejected as per 
claim 25 and in further view of claim 1 3. 

Claim 51 has been analyzed and rejected as per claim 25. 



Conclusion 
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10. The prior art of record and not relied upon is considered pertinent to applicant's 
disclosure: 

• Kim et al., U.S. Patent No. 6,754,667. discloses a content based image retrieval 
system and method of retrieving image using the same. 

• Tsujimura et al., U.S. Patent No. 5,586,197, discloses image searching method 
and apparatus thereof using color information of an input image. 

• Mukherjea et al., U.S. Patent No. 6,415,282, discloses a method and apparatus 
for query refinement. 

• Yoon et al., U.S. Patent No. 6,621,926, discloses a image retrieval system and 
method using image histogram. 

• Barber et al., U.S. Patent No. 5,579,471, discloses a image content based image 
query system and method. 

• Golshani et al., U.S. Patent No. 6,594,386, discloses a method for computerized 
Indexing and retrieval of digital images based on spatial color distribution. 

• Murakawa, U.S. Patent No. 6,463,432, discloses an apparatus for and method 
for retrieving images. 

• Ito et al., U.S. Patent No. 5,555,318, discloses a thresholding method for 
segmenting gray scale image, method for determining background concentration 
distribution, and image displacement detection method and also disclose the 
concept of area under the histogram. 
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• Stapleton et al., U.S. Patent No. 5,832,140, discloses automated quality 
assurance image processing system and further provides the concept of area 
under histogram. 

• Wang, U.S. Patent No. 6,373,979, discloses a system and method for 
determining a level of similarity among more than one image and a segmented 
data structure for enabling such determination. 

• Shimura et al., U.S. Patent No. 5,644,765, discloses a image retrieving method 
ad apparatus that calculates characteristic amounts of data correlated with and 
identifying an image. 

• Jain et al., U.S. Patent No. 5,893,095, discloses a similarity engine for content- 
based retrieval of images. 

• Jain et al.. U.S. Patent No. 5,915,250, discloses a threshold based comparison 
system and method for image search and retrieval. 

• Tang et al.. IEEE Publication, 2000, "A content-based image retrieval system on 
the mode of network", (pp. 422-425) 

• Pearce et al., IEEE Publication, 1994, "Theoretical and experimental comparison 
of the Lorenz Information Measure, Entropy, and the Mean Absolute Error", (pp. 
24-29) 

• Pass et al., IEEE Publication, 1996, "Histogram refinement for content-based 
image retrieval", (pp. 96-102) 

• Yuwono et al., IEEE Publication, 1996, "Search and ranking algorithms for 
locating resources on the world wide web", (pp. 164-171) 
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• Lee et al., IEEE Publication, 1994, "Query by image content using multiple 
objects and multiple features: user interface issues", (pp. 76-80) 

• Pala et al., IEEE Publication, 2000, "Using multiple examples for content-based 
image retrieval", (pp. 335 -338). 

• Newsome et al., IEEE Publication, 1997, "HperSQL: Web-based query interface 
for biological databases", (pp. 329-339). 

• Chen et al., IEEE Publication, 1999, "A synchronized and retrieval video/HTML 
lecture system for industry employee training", (pp. 750-755). 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Manav Seth whose telephone number is (703) 306- 
4117. The examiner can normally be reached on Monday to Friday from 8:30 am to 
5:00 pm. 

If attempts to reach the examiner by telephone are unsuccessful, examiner's trainer, 
Bhavesh Mehta, can be reached on (703) 305-3885. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status infomnation for published 
applications may be obtained from either Private PAIR or Public PAIR. Status 
information for unpublished applications is available through Private PAIR only. For 
more information about the PAIR system, see http://pair-direct.uspto.gov. Should you 
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have questions on access to the Private 
Center (EBC) at 866-217-9197 (toll-free). 



Page 20 

system, contact the Electronic Business 




REVISED 



Page 1 of 1 



Form PTO-1 449 (modined), i/nu ^ « ^ ^\ 
S NCy Z 8 2DQ3 °j 

List of Patents and Publicans for Applicaj^ 

INFORMATION DlSCLOSURETSTOtMENT 
(Use several sheets if necessary) 



Atty. Docket No. 
4380.001300/MWS 



Applicant 
Mark C. Rorvig 



Filing Date: 
March 1,2002 



Serial No. 
10/087,347 



Group: 
2621 



U.S. Patent Documents 
See Page J 



Foreign Patent Documents 
See Page I 



Other Art 
See Page i 



U.S. Patent Documents 



Exam, 
tnit. 



Ref. 
Des. 



Docunfient 
Number 



Date 



Name 



Class 



Sub 
Class 



Filing Date of 
App. 



Al 



A2 



A3 



DEC 0 2 Z003 



A4 



A5 



Te ihnology Center 26 00 



Foreign Patent Documents 



Exam. 
Init 


Ref. 
Des. 


Document 
Number 


Date 


Country 


Class 


Sub 
Class 


Translation 
Yes/No 




Bl 
















B2 
















83 















other Art (Including Author, Title, Date Pertinent Pages, Etc.) 



Exam. 
Init. 


Ref. 
Des. 


Citation 




CI 


Rorvig, "Content Based Image Retrieval Enhanced by Integration of Metadata Encoded Image 
and Text Features," Texas Center for Digital Knowledge, The University of North Texas 




02 


Jeong et al., "Image Retrieval by Content Measure Metadata Coding," CIR 2001, Tenth 
International World Wide Web Conference, May 1-5, 2001, Hong Kong, International World 
Wide Web Conference Committee: http://www.iw3c2.ora/Conferences/Welcome.htmK 
http://www 1 0.ore/cdrom/posters/D 1 1 42/index.htm 




C3 


Goodrum et al., "An Open Source Agenda for Research Linking Text and Image Content 
Features," Journal of the American Society for Information Science, Vol. 52, pp. 948-953 
(September 200 1 ), http://www.unt.edu/ir/paDers/iasisfiOO.htm 




C4 


Rorvig, "An Experimental Approach for the Content-Based Image Analysis: An Open Source 
Agenda for Research," The Department of Information Science^ The University of North Texas, 
pages 1-25 



Examiner: 



Date Considered: 0 / / CbhS 



EXAMINER: fnitial if/reference considered, whether or not citation is in conformance with MPEP609?Draw|Line T>1R0UGH 

OTATION IF NOT IN COfioRMANCE AND NOT CONSIDERED. INCLUDE COPY OF THIS FORM WITH NEXT COMMUNICATION TO APPLICANT. 



Information Disclosure Statement — PTO-l 449 (Modified) 



/ ^ 






Page 1 of 2 


FormPT0.1449(mod.n| ^ g 2,^3 ^ 


Atty. Docket No. 
4380.001300/KDG 


GamoI Kin 

oenai no. 
10/087^47 


List of Patents and Publicafipns for 
INFORMATION DISCLOSURE 


Appltc^^s 
^ftfEMENT 


Applicant 
Mark E. Rorvig 


(Use several sheets if necessary) 


Filing Date: 
March 1,2002 


Group: 
2621 


U.S. Patent Documents 
See Page 1 


Foreign Patent Documents 
See Page I 


Other Art 
See Page 1 



U.S. Patent Documents 



Exam. 
Init. 



Ref. 
Des. 



Document 
Number 



Date 



Name 



Class 



Sub 
Class 



Filing Date of 
App. 



Al 



A2 



RECEIVED 

DEC 0 2 ZUUJ 



A3 



A4 



Jiichnnlngy Center 2600 



AS 



Foreign Patent Documents 



Exam. 
Init 


Ret 
Des. 


Document 
Number 


Date 


Country 


Class 


Sub 
Class 


Translation 
Yes/No 




Bl 
















B2 
















B3 















Other Art (Including Author, Title, Date Pertinent Pages, Etc.) 



Exam. 
Init. 


Ref. 
Des. 


Citation 




C5 


Rorvig et al., "Content Based Image Retrieval by Integration of Metadata Encoded Muhimedia 
(Image and Text) Features," CIR 2002, Eleventh International World Wide Web Conference, 
May 6-11, 2002, Honolulu, Hawaii, USA, International World Wide Web Conference 
Committee: httD://www.iw3c2.orE/ConferencesAVelcome.html. 
http://www2002.orfi/CDROM/Doster/153.Ddf 


Viy 


C6 


Rorvig et al., "Exploiting Image Primitives for Effective Retrieval," CIR 2000, Third UK 
Conference on Image Retrieval, May 4-5, 2000, Old Ship Hotel, Brighton, UK, Universeity of 
Brighton: School of Infonnation Management 


\^ 


C7 


Rorvig et al., "A Common Representation Format for Multimedia Documents," Texas Center 
for Digital Knowledge, School of Library and Infonnation Sciences, University of North Texas, 
Denton, Texas (2000) 




C8 


Bergman et al., "Progressive Content-Based Retrieval from Satellite Image Archives," D-Ub 
Magazine iOciohet 1997). httD://www.dlib.orfi/dlib/octobei97/ibni/10li.html 



Examiner: -"'"^y 



£D, WHETHER I 



Date Considered: 



^7 



EXAMINER: IKHIAL IF REFmNCE CONSIDSRED. iVHETHER OR NOT QTATION IS IN CONFORMANCE WITH MPEP609; DMVW LINE 10HROUGH 
CITATION IF NOT IN CONFORMANCE AND NOT CONSIDERED. INCLUDE COPY OF THIS FORM WTTH NEXT COMMUNICATION TO APPLIC 



Information Disclosure Statement — PTO'1449 (Modified) 



I 



Page 2 Of 2 



Form PTO- 1449 (modinedL fjov 2 8 2003 ^) 
List of Patents and Publicatl^s for Applicants 

INFORMATION DISCLOSURE STATEMENT 
(Use several sheets if necessary) 


Atty. Docket No. 
4380.001300/KDG 


Serial No. 
10/087^47 


Applicant 
Mark E. Rorvig 


Filing Date: 
March 1,2002 


Group: 
2621 


U.S. Patent Documents 

See Page 1 


Foreign Patent Documents 

See Page I 


Other Art 

See Page I 



other Art (Including Author, Title, Date Pertinent Pages, Etc.) 



Exam. 
Init. 


Ref. 
Des. 


Citation 


w 


C9 


Eakins et al, *'Content-Based Image Retrieval A Report to the JISC Technology Applications 
Programme " Northumbria Image Data Research Institute (January 1999), 
http://www.unn.ac.uk/iidr/research/cbir/repoit.html 




CIO 





RECEIVED 

DEC 0 2 2003 

Technology Center 2600 



Examiner: * 




Date Considered: 



EXAMINER: INITIAL IF REFBhENCE CONStbEREb, WHtTHER OR NOT CITATION IS IN CONFORMANCE WITH MPEP609^XAW U^E THROUGH 
CITATION IF NOT IN CONFORt&ANCE AND KOT CONSIDERED. INCLUDE COPY OF THIS FORM WITH NEXT COMMUNICATICM TO APPUCANT. 



Information Disclosure Statement — PTO- 1449 (Modified) 



Notice of References Cited 


Application/Control No. 
10/087.347 


Reexamination 
RORVIG ET AL. 


Examiner 
Manav Seth 


Art Unit 
2625 


Page 1 of 3 



U.S. PATENT DOCUMENTS 



* 




Document Number 

Pniinhv PrvlA-NiimhproKinri Cnri^ 


Date 

MM-YYYY 


Name 


Classification 




A 


US-5,915,038 


06-1999 


Abdel-Mottaleb et al. 


382/209 




B 


US-3,979,555 


09-1976 


Opittek et al. 


348/672 




c 


US-6,754,667 


06-2004 


Kim et al. 


707/102 




D 


US-5. 586.197 


12-1996 


Tsujimura et al. 


382/162 




E 


US-6 415 282 


07-2002 


Mukherjea et al. 


707/3 




F 


US-6,621,926 


09-2003 


Yoon et al. 


382/168 




G 


US-5. 579,471 


11-1996 


Barber et al. 


715/700 




H 


US-6,594,386 


07-2003 


Golshani et at. 


382/166 




1 


US-6.463.432 


10-2002 


Murakawa, Akira 


707/5 




J 


US-5.555.318 


09-1996 


Ito et al. 


382/168 




K 


US-5,832.140 


11-1998 


Stapleton et al. 


382/298 




L 


US-6,373.979 


04-2002 


Wang. Jia 


382/165 




M 


US-5.644,765 


07-1997 


Shimura et al. 


707/104.1 


FOREIGN PATENT DOCUMENTS 






Document Number 
Country Code-Number-Kind Code 


Date 
MM-YYYY 


Country 


Name 


Classification 




N 














0 














P 














Q 














R 














S 














T 












NON-PATENT DOCUMENTS 


* 




include as applicable: Author, Title Date, Publisher. Edition or Volume, Pertinent Pages) 




U 


Tsymbalenko et aL. January 2001, "Using HTML Metadata to find relevant images on the world wide web", (pp. 1-9) 




V 


Gregory Baxes. 1994, Book Publication, "Digital Image Processing: Principles and Applications". 




w 


Tang et al.. IEEE Publication. 2000, "A content-based image retrieval system on the mode of network", (pp. 422^25) 




X 


Pearce et al., IEEE Publication, 1994, " Theoretical and experimental comparison of the Lorenz Information Measure, Entropy, 
and the Mean Absolute Eror". (pp. 24-29) 



*A copy of this reference is not being furnished with this OfTice action. (See MPEP § 707.05(a).) 
Dates in MM-YYYY format are publication dates. Classifications may be US or foreign. 



U.S. Patent and Trademark Office 

PTO-892 (Rev. 01-2001) 



Notice of References Cited 



Part of Paper No. 03012002 







Application/Control No. 


Applicant(s)/Patent Under 








Reexamination 








10/087.347 


RORVIG ET AL. 






Notice Of HeiBrences u/reo 










Examiner 


Art Unit 








Manav Seth 


2625 


Page 2 of 3 



U.S. PATENT DOCUMENTS 



* 




Document Number 
Country Code-Number-Kind Code 


Date 
MM-YYYY 


Name 


Classification 




A 


UO-0,090,U90 


f\A 10QQ 


jam ei ai. 


707/6 




B 


1 IC C O'l C OCA 

Uo-0,9i 0,250 


uo-iyyy 


jam ei ai. 


707/100 




0 


Uo- 










D 


US- 










E 


US- 










F 


1 IC 

US- 










G 


US- 










H 


us- 










1 


us- 










J 


us- 










K 


us- 










L 


us- 










M 


us- 








FOREIGN PATENT DOCUMENTS 






Document Number 
Country Code-Number-Kind Code 


Date 
MM-YYYY 


Country 


Name 


Classification 




N 














0 














P 














Q 














R 














S 














T 












NON-PATENT DOCUMENTS 


* 




Include as applicable: Author. Title Date. Publisher, Edition or Volume, Pertinent Pages) 




U 


Pass et al., IEEE Publication, 1996. " Histogram refinement for content-based image retrievar. (pp. 96-102) 




V 


Yuwono et al.. IEEE Publication, 1996, "Search and ranking algorithms for locating resources on the world wide web", (pp. 164- 
171) 




w 


Lee et al.. IEEE Publication, 1994, " Query by image content using multiple objects and multiple features: user interface Issues", 
(pp. 76-80) 




X 


Pala et al., IEEE Publication. 2000, " Using multiple examples for content based image retrieval", (pp. 335-338) 



*A copy of this reference is not being furnished with this Office action, (See MPEP § 707.05(a).) 
Dates in MM-YYYY format are publication dates. Classifications may be US or foreign. 



U.S. Patent and Trademark Office 
PTO-892 (Rev. 01-2001) 



Notice of References Cited 



Part of Paper No. 03012002 







Application/Control No. 


Applicant(s)/Patent Under 








Reexamination 








10/087,347 


RORVIG ET AL, 




r 


Notice of R^f^rGiicGs CitGd 








Examiner 


Art Unit 








Manav Seth 


2625 


Page 3 of 3 



U.S. PATENT DOCUMENTS 



it 




Document Number 

ouuniry v^OQc'riuinDci-fvina wouc 


Date 

KAhM WW 

Ivilvl-T YY Y 


Name 


Classification 




A 


US- 










B 


US- 










c 


US- 










D 


US- 










E 


US- 










p 


US- 










G 


US- 










H 


US- 










1 


us- 










J 


us- 










K 


us- 










L 


us- 










M 


us- 








FOREIGN PATENT DOCUMENTS 


* 




Document Number 
Country Code-Number-Kind Code 


Date 
MM-YYYY 


Country 


Name 


Classification 




N 














0 














P 














Q 














R 














S 














T 












NON-PATENT DOCUMENTS 


★ 




Include as applicable: Author, Title Date, Publisher, Edition or Volume, Pertinent Pages) 




U 


Newsome et al.. IEEE Publication. 1997, " HyperSQL: Web-based query interface for biological databases", (pp. 329-339) 




V 


Chen et al., IEEE Publication, 1999, " A synchronized and retrival video/HTML lecture system for industry employee training". 
(pp.750-755) 




w 






X 





*A copy of this reference Is not being furnished with this Office action. (See MPEP § 707.05(a}.) 
Dates in MM-YYYY format are publication dates. Classifications may be US or foreign. 



U.S. Patent and Trademark Office 
PTO-892 (Rev. 01-2001) 



Notice of References Cited 



Part of Paper No. 03012002 



Using HTML Metadata to Find Relevant Images 
on the World Wide Web* 

Yelena Tsymbalenko 
Ethan V. Munson 
Dept. ofEECS 
University of Wisconsin-Milwaukee 

Milwaukee, WI 53201 USA 
{yelena , munson}®cs . uwm . edu 

January 9, 2001 



1 Introduction 

The World Wide Web has become one of the largest information repositories the world 
has ever seen. While much of that information is textual, a substantial amount is drawn 
from other media, especially static images. 

An obvious way to use the Web is to treat it as a sort of library that can be indexed, 
catalogued, and queried. Web search engines and directory-style portals do just that by 
indexing or cataloguing the Web*s textual content to provide convenient access services 
to end-users. 

Providing search services for the Web's image content has been more difficult. 
A number of researchers have developed Web image search tools, but these systems 
are limited by the use of visually-based queries and databases that represent a small 
subset of the Web's content. Companies like Alta Vista and Lycos Multimedia are also 
beginning to provide image search services. Some features of Alta Vista's image search 
system suggest that their technology is similar to that reported in this paper. 

This paper describes a new approach to finding images on the Web. Instead of 
analyzing the images themselves using image processing techniques, our software ex- 
amines the HTML source code that refers to the image and using only this textual 
information, decides whether or not an image is relevant to a query. 

In the next section, we provide some background on prior research into image 
search and multimedia research using similar strategies. In Section 3, we describe 
the software testbed we built, while in Section 4 we describe the results of experiments 
that we ran using this testbed. Section 5 closes with conclusions and suggestions for 
future research. 

•This research was sponsored by the U. S. Dcpanmcnt of Defense. Ethan V. Munson was also supported 
by NSF CAREER award CCR-9734102. 
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2 Background 



There is a large body of research on multimedia indexing and retrieval. Most of this 
research has been performed using closed databases whose content was under the di- 
rect control of the researchers. Examples of such research are easily found in recent 
conference proceedings [2, 1] and journals [8]. 

A good example of this course of research is the IBM Almaden Center's Query-By- 
Image-Content (QBIC) system [6], QBIC allows users to make image queries based 
on image features such as shape, color, texture, and object layout. Users define queries 
either by providing a sample image or by using a graphical tool to make a sketch or 
diagram. QBIC has a well-developed visual query language and an interesting GUI. Its 
use of image features for indexing and querying is both an advantage and a disadvan- 
tage. When users are seeking images with a particular appearance (e.g. mix of colors, 
object with a particular shape), it is very helpful. When users are looking for pictures 
of particular content, it is less helpful because the low-level image characteristics give 
only limited insight into the real semantics of images. For example, a person's facial 
appearance may be fairly constant, but their clothing may not be. Many objects (e.g. 
pencils, fish, or motorcycles) look radically different depending on camera viewpoint. 
QBIC's approach also appears ill-suited to the scale of the Web. QBIC constructs its in- 
dices by pre-analyzing each image in its database. This is computationally demanding 
and it is difficult to see how it can be done for the Web as a whole. 

WebSeek[lO] is a more direct attempt to create a directory and database of images 
from the Web. WebSeek uses a mix of automated and manual techniques to create 
a database of images downloaded from the Web. It automatically inspects HTML 
documents, extracting keywords from the image file names that are used to create a 
histogram of file names. This histogram is used to manually construct a subject hier- 
archy for the downloaded images. In another manual step, the downloaded images are 
mapped into the subject hierarchy. Once this is done, WebSeek users can browse the 
categories in the subject hierarchy, search the categories by keyword, and search the 
database using image features, especially color histogram information. 

WebSeek has a large database of Web images and supports both text-based and 
image-based queries. Text-based queries have more semantic content than image-based 
queries. However, it seems unlikely that WebSeek*s database can approach the scale 
of the entire Web, since manual categorization of images is a slow and labor-intensive 
process. 

WebSeer [7] is the system most closely related to this research. The principal in- 
vestigator. Swain, now works for Alta Vista. Alta Vista has a new image search tool 
whose qualities appear to derive from Swain's research on WebSeer. 

The goal of research on WebSeer was to classify images into categories such as 
photographs, portraits and computer-generated drawings. To do this, WebSeer supple- 
mented information from image content analysis with information from HTML meta- 
data. WebSeer used several kinds of HTML metadata including the file names of im- 
ages, the text of the ALT attribute of the IMG tag, and the text of hyperlinks to images 
to help identify relevant images. Since the WebSeer research emphasized image cate- 
gorization, this use of metadata is not discussed in detail in any of the WebSeer papers. 
We assume that the metadata was helpful, but a detailed analysis was not provided. 



\ The research most similar in spirit to that reported in this paper was conducted 
by Brown et al. [3, 4]. In their first study [3], they used textual "closed captions" 
transmitted with broadcast news to index stored video. In the second study [4], they 
used speech recognition techniques to analyze the audio components of video mail. 
Then, the textual content of the recognized speech was used for indexing the video 
content. In both studies, Brown et at. took advantage of the fact that data in two media 
were traveling together and exploited data in one medium to better understand the 

\ content in the other. 



3 Image Search Architecture 

We applied the cross-media indexing strategy of Brown et al. to the Web image search^ 
problem. We started with the observation that images on the Web are almost alway s 
accessed through HTML documents and that the bulk ot the content of HTML do cu- 
me nts is tcxtuaK in addition, the HTML source includes text that defines a hierarchical 
'information structure. gftpfiif^ef frnt}i the texfiml r'^"*'*"*^'"^ c»ni/>h^rA HTK/ff- 
documents Xo ^isJ^ m^^^^^^^jj^c^^^^f^^ and use this' iTifttarf!^|g tn rifttermine 
which iinages may rerele^Uto a query . 

le second aspect of our strategy was to exploit existing Web search engines in 
order to search the entire Web, rather than a closed database of previously downloaded 
images. By using existing search engines, we saved considerable engineering effort and 
were able to exploit the search engine designers' considerable expertise in computing 
L--the relevance of Web documents to textual queries. 
^ We constructed a Web image search application composed of four modules: text 
/ search, document download and cleaning, document analysis, and search results inter- 
face. 

^The text search module accepted a one-word query and sent it to the Alta Vista 
search engine. Alta Vista retu rned an HTML document with links to ten Web pages 
that best matcheSthe que ry . lnaddition> the bottom of this document had linku p as 
manjf1l!i \imcteen other pages of searc h r^«f^\ lt?; T n r ffrrtj AltrT^^ntn r?ft uTi^d |^"J;^^ 

^{fTpap^ hAv{ t ]£ fiirmr r"1""T""" ^^ ^^^ "II I I 1 1 ni i t^ ifric- ThM If I t uii it. hj TioHiile 

extracted the URLs of these pages from the search results documents and sent a subset 
f these URLs to the document download and cleaning module. 

C" ' The download and cleaning module first used the low-level HTTP interfaces to 
jtownload the Web pages for each URL. In addition, this module downloaded cvei 
jm age referenced by each document, in ordeTto facilitate iater analysis. At this point, 
fpmnenUhafm^y HTML documents on the Web are ill-formed and 
thus are difficult to analyze. We solved this problem by using the "Tidy" application [9] 
developed by Raggett for the Web Consortium. Tidy uses heuristic rules to translate 
HTML (well-formed or ill-formed) to well-formed XHTML (an analog of HTML that 
onforms to the XML specification [5]). 
I — ^The document analysis module parsed the well-formed XHTML documents into 
/ an internal tree representation and then searched for "clues*' that might indicate that an 
/ image in the document matched the query. The analysis module considered an image 
1 to match the query if the query appeared in any of the following eight places: 
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1. An image's file name; 

2. The textual content of the document's TITLE element; 

3. The value of the ALT attribute of the IMG element; 

4. The textual content of an anchor (A) element whose target was the image's file; 

5. The value of the TITLE attribute of an anchor (A) element; 

6. The textual content of the paragraph that was the parent of the IMG element; 

7. The textual content of any paragraph located within the same CENTER element 
as the IMG element; and 

8. The textual content of heading elements that precede the image. 

Finally, the search results interface module took the list of matching images gener- 
ated by the document analysis module and created a Web page interface with links to 
the matching images and the pages that they came from. This final interface was not 
designed for end-users, who would certainly prefer an interface based on thumbnail 
images, but it was suitable for our image search experiments. 

At this point, some comments on the design of the testbed are appropriate. 

• By using a commercial search engine as the first step in image search, we saved 
a tremendous amount of engineering effort. However, it clearly makes the set of 
images returned by the system depend on the behavior of the search engine. At 
this time, we have no idea what effect the choice of search engine had on our 
research. 

• The eight "clues" used to find matching images were derived from the work 
on WebSeer and from our own study of the HTML specification and of Web 
document design practice. 

• About 1% of the HTML documents we downloaded were so ill-formed that the 
Tidy program could not produce an XHTML version. 

• We determined that images smaller than 65 pixels in either the horizontal or 
vertical dimension could be ignored. We found through informal experimenta- 
tion that such images were essentially always "decorative" elements like borders, 
bullets, or banner advertisements. 

4 Image Search Experiment 

Using the testbed described in the previous section, we conducted an image search 
experiment in the fall of 1999 to assess the effectiveness of our strategy. Our goal was 
to answer two research questions: 

• Which HTML features reveal the most information about images in a document? 
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Do image search results depend on the type of query made? 



We used our testbed to search for images using twelve one-word queries drawn 
from five categories. The queries, listed by category, were: 

Famous People: "Gorbachev," "Yeltsin." and "Streisand" 

Non-famous People: "Yelena" and "Ekaterina" 

Famous Places: "Paris" and "London" 

Less-famous Places: "Bremen" and "Spokane" 

Phenomena: "Explosion," "Sunset," and "Hurricane" 

We modified the testbed so that, for each query, it downloaded 30 of the 200 pages 
returned by Alta Vista and all of the images on those pages. The 30 pages were taken 
from the first, eleventh, and twentieth search results pages.' This procedure could 
have produced 360 Web pages, but only 276 pages containing a total of 1578 non- 
decorative images were accessible. For each image, we recorded which of the eight 
clues would have caused that image to be retrieved by our software. In addition, one 
of us (Tsymbalenko) looked at each image and classified it as either "relevant" or "not 
relevant" to the query word. 



4.1 Results 

We used the human relevance ratings and the data about which images would have 
been retrieved to compute the standard information retrieval measures of precision and 
recall. Precision is the proportion of images that a clue caused to be retrieved that are 
actually relevant to the query. It is computed by the formula 

^ . . Retrieved images that are relevant 

Precision = = — ; : — 

Total retrieved images 

Recall is the proportion of relevant images (out of the "complete" collection) that are 
retrieved and computed by the formula 

Relevant images that were retrieved 

Recall = ~ 

Total relevant images in collection 

It is important to give a cautionary note about our recall statistics. Recall is nor- 
mally computed using some standard body of material (e.g. one year's issues of a ma- 
jor newspaper), called a corpus, which is used as the entire "collection" over which 
searches are performed. Our recall statistics were computed using the 276 HTML doc- 
uments returned by Alta Vista as our corpus. This is clearly not a valid approach, since 
the set of documents returned by Alta Vista were chosen precisely because they were 

'We originally chose this approach in the mistaken belief that there would be interesting differences 
between the first search results (high relevance) and the last search results (low relevance). In retrospect, this 
was a pointless exercise, because Alta Vista was finding tens of thousands of Web pages that matched our 
queries, but only providing the best 200 of these. All of these 200 pages were highly relevant to our queries. 
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j.yj 


0 0 
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0.0 
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0.0 
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Yeltsin 


60.0 
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0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


Streisand 


23.0 


84.6 


0.0 


0.0 


0.0 
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Yelena 


IKl 


85.0 


5.0 


0.0 
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Ekaterina 
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60.0 


14.3 


0.0 


0.0 


0.0 


0.0 
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Paris 
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0.0 
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0.0 


0.0 


0.0 


0.0 


London 


12.5 


95.8 


12.5 


0.0 


0.0 


0.0 


0.0 


0.0 


Bremen 


80.0 


90.0 


33.0 


0.0 


0.0 


0.0 


0.0 


0.0 


Spokane 


90.0 


100.0 


30.0 


0.0 


0.0 


0.0 


0.0 


0.0 


Explosion 


25.0 


75.0 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


Sunset 


88.8 


11.1 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


Hurricane 


53.8 


38.5 


0.0 


0.0 


0.0 


0.0 


0.0 


0.0 


Median percent 


39.9 


79.5 


2.5 


0.0 


0.0 


0.0 


0.0 


0.0 



Table 1 : Recall percentages for each clue and each query. 



highly relevant to our query and this could easily bias our results. Unfortunately, we 
know of no standard Web corpus on which to perform our tests. Thus, we use our recall 
statistics only to give initial results on the relative merits of our metadata clues, not to 
make comparisons to other search approaches. 

Our recall results are shown in Table 1 , Only clues 1 (image file name) and 2 (con- 
tent of document's TITLE element) show high levels of recall with medians of 39.9% 
and 79.5%, respectively. Clue 3 (value of the ALT attribute of the IMG element) shows 
a modest level of recall with a median of only 2.5%, but individual recall percentages 
as high as 33%. Only one other clue, number 6 (textual content of a paragraph that 
contains an IMG element), showed any recall at all. 

Precision results are shown in Table 2. For clue 1 (image file name), precision 
ranges from 36% to 100% 

4.2 Discussion 

Examining the results of the image search experiment closely, several key results emerge. 

The three clues (1, 2, and 3) that show significant levels of recall are relatively 
simple. Image file name (clue I) presumably works because Web site designers prefer 
mnemomic names for image files. The TITLE element (clue 2) is designed to provide a 
high-level description of a document's content and is widely used because it get listed 
in search engine results and the browser's title bar. The ALT attribute of the IMG 
element (clue 3) is explicitly designed to be a textual alternative to the image itself 
The remaining five clues generally emphasize HTML's underlying structural model, 
and based on our results, do not seem to be widely used idioms among HTML authors 
and showed essentially no recall. 

Looking at the types of queries, image file name (clue I) had poor recall for the 
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Query 


Clue 


1 


2 


3 


4 


5 


6 


7 


8 


Gorbachev 


83.0 


46.0 


100.0 






100.0 






Yeitsin 


100.0 


60.0 














Streisand 


100.0 


47.8 














Yelena 


66.7 


89.5 


100.0 












Ekaterina 




100.0 


100.0 












Paris 


84.0 


70.0 














London 


60.0 


46.9 


75.0 












Bremen 


66.7 


69.2 


100.0 












Spokane 


75.0 


71.4 


100.0 












Explosion 


100.0 


50.0 














Sunset 


36.0 


50.0 














Hurricane 


100.0 


35.7 














Median percent 


83.0 


55.0 


87.5 


0.0 


0.0 


0.0 


0.0 


0.0 



Table 2: Precision percentages for each clue and each query. Dashes are used for clue- 
query combinations that had zero recall, since precision cannot be computed when 
there is no recall. Median precision percentages are computed based only on those 
queries that had some recall. 

names of people, but excellent recall for place name queries, particularly the less fa- 
mous cities. An informal look at the details of this phenomenom suggests that Web 
designers often use nicknames for the image file names people (e.g. "Goiby" for "Gor- 
bachev"), but usually use full names for places. 

The precision results are more striking. The three queries that had some recall 
all showed good precision. The image file name had precision ranging from 36% to 
100% with a median of 83%. The content of the TITLE element had precision ranging 
from 35.7% to 100% with a median of 55%. These precision results are strong by the 
normal standards of textual information retrieval. The precision results for the value 
of the ALT attribute of the IMG entity are quite impressive. In general, this clue had 
100% precision. 

Now, it is possible to examine the original research questions that we posed. 

First, we asked which HTML features revealed the most information about the 
images in a document. It is clear that only three of the HTML features that we tested 
showed any real utility for identifying the content of images. Image file name, the 
content of the TITLE element, and the value of the ALT attribute appear useful in 
image search, while the other clues we tested do not appear useful. 

Second, we asked whether the type of query affected our image search results. The 
type of query does seem to affect our recall results, where the names of people show 
less recall than the names of places. No consistent effect can be seen in the precision 
results. 
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4.3 Cautionary Notes 

These results should be viewed cautiously. This was a small study and it has some 
flaws. 

• Our results are affected by our use of the Alta Vista search engine. It is not 
clear what effect this had, but the use of a different search engine might produce 

different results. 

• Our relevance ratings for the images were performed by one person. They should 
really be based on the judgement of multiple relevance raters. Also, image rele- 
vance is harder to judge than text relevance and may require somewhat different 
rating methods. 

• The distinction between famous and non-famous people is confounded with an- 
other effect. All of the famous names used were family names. Both of the non- 
famous names used were personal names. Also, one of the non-famous names 
is actually the personal name of moderately famous person (the skater Ekaterina 
Gordeeva). 

• The one word queries we used do not allow the construction of very precise 
queries, especially for the names of people. 

• Our use of the Tidy program may have removed some clues. We believe that an 
author writing HTML like the following, probably views the image as part of the 
first paragraph (that is, a child of the paragraph). Tidy makes the image a sibling. 

<P>Some text, <IMG href =" img .gif " > 
<P>More text. 



5 Conclusion and Suggestions for Future Research 

This paper has described new software for finding images on the Web based on simple 
text queries and an experiment testing the techniques used by that software. The soft- 
ware used a text search engine to find documents containing text matching the query 
and then analyzes the content of the document to determine whether the images in that 
document may be relevant to the query. The experiment demonstrated that some of the 
techniques used to identify relevant images were effective and that others were not. Its 
results also suggested that the type of query made alters the effectiveness of the search 
technique. 

Why is this software interesting? Image search is an inherently interesting prob- 
lem and is being studied widely. This software is interesting because it is able to find 
relevant images without actually downloading or analyzing those images. Instead, it 
examines only the text that surrounds the reference to the image in the HTML docu- 
ment. It is widely known that image download requires substantial amounts of time 
when traversing the Web. Any system that can find images without downloading them 
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has an inherent performance advantage over systems that must download images. Fur- 
thermore, we believe that the text in an HTML document gives more precise semantic 
information about the content of images than any existing image processing technique. 
Image processing can be used to determine that an image shows the face of a person, 
but the file name of the image may say exactly what person is shown. 

Considerably more research is called for Our basic results showing the success of 
our technique need to be replicated using a larger study, more complex queries, and 
a more robust relevance rating system. While our techniques appear strong and they 
have an efficiency advantage over image processing approaches, there is no reason that 
textual metadata cannot be combined with image processing in order to produce even 
bener results. Finally, we continue to believe the type of query will interact with search 
heuristics in interesting ways. 
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Chapter 3 



inal image. When the numeric range of a pixel's brightness is increased, so is the 
pixel s brightness resolution. 

Digressing momentarily, let s clarify our definitions of the terms intensity and 
brightness. "Intensity" (or, more appropriately, radiant intensity) refers to the magni- 
tude, or amount, of hght energy actually reflected or transmitted from a physical 
scene. The term "brightness" (or, more appropriately, luminous brightness) refers to 
the measured intensity after it is acquired (say, using a video camera), sampled, 
quantized, displayed, and observed (with our eyes). The brightness of a pixel 
accounts for all the eflfects induced by the entire imaging system. This may seem 
Uke a somewhat trivial distinction, but it accounts for the fact that the measured 
brightnesses in our digital image are only representations of the actual energy radi- 
ated from the original physical scene. Additionally, the terms "intensity," "bright- 
ness," "radiance," and "luminance" are often used synonymously to mean the 
same thing. They are, however, distincdy different terms relating to measures of 
the quantity of Hght. As we discuss digital image processing operations, we will 
refer to digital pixels as having a brightness property. 

Following the sampUng process, each sample is quantized. This quantization 
process converts the continuous-tone intensity, at the sample point, to a digital 
brightness value. The accuracy of the digital value is direcdy dependent upon how 
many bits are used in the quantizer. If three bits are used, the brightness can be 
converted to one of eight gray levels. In this case, gray level "0" represents black, 
gray level "7" represents white, and gray levek "1" through "6" represent the 
ascending gray tones between black and white. The eight gray levels comprise 
what is called the gray scale, or in this case, the 3-bit gray scale. 

With a 4-bit brightness value, every pixel's brightness is represented by one of 
16 gray levek. A 5-bit brightness value yields a 32-level gray-scale range. An 8-bit 
brightness value yields a 256-level gray-scale range. Every additional bit used to 
represent the brightness doubles the range of the gray scale. The range of the gray 
scale is also referred to as dynamic range. An image with 8-bit brightness values is 
said to have an available dynamic range of 256 to 1 . Several gray scales are shown 
in Figure 3.10. Notice how the smoothness of the gray scale improves as more bits 
are used to represent brightnesses. 

Figure 3.11 shows an image quantized to various brightness resolutions. The 
image quantized to eight bits of brightness resolution appean very natural and 
continuous. As the brightness resolution decreases, the image appears coarser and 
more mechanical. This effect is known as brightness contouring, or posterizatiotx. 
Contouring occurs when there are not enough gray levels to represent the actual 
brightness in the original image adequately. Gradual brightness changes end up 
quantized a little brighter or dinuner than their original intensities. Brightn 
contouring is the effect of insufficient brightness resolution. 

Figure 3.12 shows the same image broken into individual bit-planes. Each bit 
plane represents the on or off level of the particular bits contribution to the ove 
all pixel brightness. Clearly, much of the structure of the image is convey 



4 



A Content-based Image Retrieval System on the Mode of Network 



Hongmci Tang Ming Yu Zhitao Xiao Yingchun Guo 
. Hcbci University of Technology, Tianjin, 300130, China 
" E-mail: vumingf^hebut.edu.cn 



Abstract 

The paper presents a small image retrieval system on 
the mode of computer network In order to overcome some 
limitations of the traditional retrieval methods, we design 
an image retrieval system that integrates text-based and 
content-based image retrieval (CBIR) techniques. When 
the user submits the query, the system performs the query 
programs. The retrieval results according to the similarity 
to the queried image are shown to the user through Web 
browser. Experiments show that this system is simple, 
practical, effectively and can be extended easily, 

L Introduction 

With the coming of infonnation society and the widely 
applications of Internet technology, people meet with 
more and more images and videos frequently. It becomes 
an urgent task that how to organize, manage and retrieve 
these infonnation efficiently. Image retrieval is one of the 
most important works in image processing and machine 
inteUigcncc. It has been used widely in digital Ubrary, ' 
remote teaching, renibte medical, trademark management, 
security system and material analysis etc. 

The text-based approach relies on text description of 
image contents and makes use of the traditional 
information retrieval .(IR) techniques. Text-based 
techniques can capture high level abstraction and concepts. 
It is easy to issue text queries. But text descriptions are 
sometime subjective and incomplete, and cannot depict 
complicated image features very well. Text-based 
techniques cannot accept pictorial queries (query by 
example). 

In the recent years, content-based image retrieval 
techniques have been proposed to overcome the 
limitations of the text-based retrieval techniques [1-3]. 
CBIR (content-based image retrieval) techn ology means 
the low-level visual features of images such as color. 



shape, texture, contour and locati on can be use d a^^ im^e 
^cont ent table to match and retnf^vAl imay^ Content-based 
techniques can capture low-level image features and 
accept pictorial queries. But they cannot capture high- 
level concepts. Pictorial query process is hard to start, as 
the user has to specify the query image by selecting an 
existing image or drawing a sketch. 

We propose to integrate the text-based and content- 
based techniques into one system. Such a system can 
capture both high and low level features. Userecan start 
their search process by issuing a text query . From the 
initial returned images^ they can select images as conten t- 
based queries. The fmat returned images are based on 



combined matching scores of the text-based and content- 
based searching, incorporating the user's preference 
weighting. The system is easy to use because the user 
starts a retrieve process by typing in keywords instead of 
having to use example images or to describe image 
features. ^ 
, There arc many content-based image retrieval 
techniques, such as those based on color, texture and shape 
[1-3]. In our prototype system, we impleniented the color 
based reurieval technique for the following reason s. Firstly, 
the color-based technique has been rep orted to produce 
goo d retrie val performance. Secondly, it is simple to 



implement Unlike -the texture based and shape based 
methods, it does not require image segmentation which, 
itself is a hard image processing problem. 

Another image analysis about object's symmetry is also 
included in our image retrieval system. Symmetry is an 
important mechanism by which we identify the structure 
of objecu. A limited number of approaches have been 
tried in the detection of symmetry in images. The 
symmetry analysis has been resolved by the local energy 
model and phase congruency in the system. 

In the following section, we describe the details of the 
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proposed integrated imajge retrieval system. We then 
present some experimental results in Section 3. Section 4 
concludes the paper. 

11. The Image Retrieval System 

2.1 The image retrieval system architecture 

The Web browser/Web server (B/S) mode has been 

adopted in bur image retrieval system. The outstanding 
character of this management mode lies in that programs, 
database and other component are all concentrated on the 
server. The maintenance and updating of the system, the 
backup of database and daily maintenance etc are all 
fmished in the server. The browser doesn-t need any other 
software, correlative management and maintenance tasks. 
Thus, the data that the users retrieve and operate are all 
from the same database which ensure data's integrity. 
Client and server interact by common gateway interface 
(CGI) or active server page (ASP). Generally speaking, 
when the Web server application on working, according to 
the standard http, the 80 port (other ports are within the 
consideration, which depends on how the server is set) 
will listening to the request from the users. When the users 
send their requests through browser, the application can 
carry out the response according to the request of users 
after it obtains the paftoeters offered by the users. In the 
end, it sends the fmal result to tiie users and completes the 
interact course in this way. The users can do database 
operations quickly and conveniently by way of the 

universal client tool browser. Our image retrieval 

system architecture is showed in Figure 1 . 

2.2 The Integrated Image retrieval process 

_We collect images and HTML document , ^atc s 
indexes based on text description and color histogr ams of 
images firstly . And then during the retrieva Lphaseptfie 
system calculate s the similarity between the query and 
mdexed images based on a combination of text and color 



indexes. We can carry out text-based image indexing and 
retrieval based on text description within HTML 
documents using the traditional text information retrieval 
(IR) techniques [4]. Different HTML documents are 
created by different authors. Different forms of the same 
words may be used in different documents. Words or 
terms appearing at different locations of an HTML 



document have different levels of importance or relevance 
to related Images. For example, a word appearing in the 

image URL is more important to the image than another 
word appearing in a paragraph somewhere else in the 
document. Therefore, in order to improve the retrieval, 
effectiveness, we have assigned temi weights based on 
term positions: higher weights are assigned to terms that 
are considered more important based on their appearing 
positions. 




User 



User 



User 



Web browser 



Req uest! T Respo nse 



HTTP 



Web server JiTML Documents 



Request Operation ^ ^ Results R eturn 



Internet/Intranet CGI 



The Interface of DBMS and Web 



Database Manage System 



Communicaiion System (TCP/IP, HTTP) 



Fig. 1 The Architecture of Image Retrieval System 

in the cok>r based image retrieval technique^ each image 
i nTBr"'aM ba^ IS nonnally represented using three 
primaries of die color space chose n , t^acn color channel is 
quantized into m intervals. So the total number of discrete 
color combinations (called bins) n is equal to m'. A color 
histogram H(M) is a vector (hi, h2,..., hn), where each 
element hj represents the number- of pixels falling in bin j 
in image M. These histograms arc the feature vectors^ 
(inde xes) to be stored as the index of the image database. 
During image retrieval, a histogra m j s found for the g^ erv 
image or estimated fr fTm "^r^^ g.iArji. a metric Is used 
to measure the distance between the histograms of the 
query image and images in the database: If images are of 
different size, their histograms are normalized. Images 



with a distance smal ler than a pre-defined threshold a re 
retrieved from the database and presente d to the us er 
Recording to the similanty to tnequeried image . The main 
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problem with the basic color based image retrieval is that 
a color may be similar to colors of more than one bin, but 
it is nonnally only quantized into one bin, leading to the 
problem that images consisting of similar colors have very 
large histogram distance. To solve the above problems, 
instead of dividing each color channel by a constant 
(quatization step) when obtaining histogram, we propose 
to fmd representative colors in the CIEL*u*v* color space. 
The number of the- representative colors is equal to the 
required number of bins. These representative colors are 
uniformly distributed in the ClEL^u^v* color space. 
While building histogram, ten perceptually most similar 
representative colors are found for each pixel. Then 
weights are assigned to these ten representative colors 
inversely proportional to the color distances. The total 
weights for each pixel is equal to 1 . In this way, we obtain 
a so-called perceptually weighted histogram (PWH) for 
each image. It has been shown that the PWH based 
method has higher image retrieval performance than the 
normal color histogram based method [5]. 

The user starts a retrieval process by typing in some key 
words. Once the initial result is presented to the user, the 
user can select an image to do a search based on its PWH, 
Alternatively, the user can make a query based on a 
combination of text and PWH. Figure 2 shows the 
retrieval results based on key words and the PWH of the 
selected unage. 

The queried image is: 

m 

Here are retrieval results: 

lbult003 2bt2ttOl8 3butta06 
^ ^ 



Fig.2. The Image Retrieval Results 

We can conclude that different query combinations 
result in different images and presentation orders. The user 
can choose an appropriate query combination based on 
his/her requirements and knowledge. For example, the 
user can assign a high weight to text if he/she is looking 



for some high level concepts. On die other hand, if he/she 
is looking for something with similar color distribution, 
he/she can assign high weights to PWH. Jhe retrieva l 
rc5Uits-ar*-sho4«^to the user accord ing to the similarity to 
jhf qiigriyi ima^e thro ugh Web browse r. There are two or 
three hyper linked buttons under every image. Click these 
bunons, y ou can jet .thc-conft 



indine image infom iation 
(such as the de tailed information about this ima| e, display 
the f ull imag e gfc). 

Another image analysis about object's symmetry is also 
included in our image retrieval system. Symmetry is an 
important mechanism by which we identify the structure 
of objects. Man-made objects, plants and animals are 
usually highly recognizable from the symmetry, or partial 
symmetries that they often exhibit. A limited number of 
approaches have been tried in the detection of symmetry 
in images. A fundamental weakness found in most is that 
they require objects to be segmented prior to any 
synuiietry analysis. For example Atallah (6] describes an 
algorithm that requires objects to be represented in terms 
of points, line segments and circles. A new measure of 
symmetry that is presented that does not require any prior 
recognition or segmentation of objects by Poter Kovesi [7]. 
This measure is based on phase congruency and local 
energy model. The phase cohgnicncy function is 
developed from the Fouries series expansion of one 
dimensional signal, / at some lacation, x, 
'W=2^-cos(n<ur+^^) = ]^i4,cos(^,(x)), where A^, 

represents the amplitude of the /i^ cosine component* q} is 
a constant (usually 2ii) . and is the phase offset of the 
component. The function ^.(jr) represents tiie local 
phase of the Fourier component at position x. Phase 
congruency is defined as 

y ^cos(i^„(x)^^(x)) 
/»C(x) = max -=2 « . The value of 

^{x) that maximizes this equation is the amplitude 
weighted mean local phase angle of all the Fourier terms 
at the point being considered. As phase congruency is a 
rather awkward to calculate. As an alternative to this 
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Venkatesh and Owens [8] show that points of maximum 
phase congniency can be calculated equivalently by 

searching for peaks in the local energy foinction. The local 
energy function is defined for a one dimensional^ 
luminance profile, /(x), as the moduhis of a complex 

number. E{x)^^l\x)7iF{x) , where the real 

component is represented by /(x) and the imaginary 
component by iH(x\ where f = V^ and H{x) is the 
Hitbert transform of /(jc). Venkatesh and Owens prove tfiai 
energy is equal to phase congruency scaled by the sum of 
the Fourier amplitudes, thai is £{x)»PC{x)^A^ . Phase 

m 

congruency is calculated via wavelet. Finally we can get a 
measure of symmetry: 

f is a small constant to avoid division by zero, T is the 
estimated noise influence, and [ J denotes that the 
enclosed quantity is itself if it is positive, and zero for all 
other values. One dimension analysis can be extended to 
two dimensions. The measure of symmetry is normalized, 
dimensionless measure. They are independent of the 
brightness or contrast of image features. 

Ill, Experiment Results 
The image retrieval system was tested on text searching, 
color searching and combined searching of text and color. 
Three queries for each set were performed on a database 
of approximately 800 images. The experiment results 
show that the combined retrieval returns results that 
matched either the semantic meaning of terms or the low- 
level color features of images or both. 

rV. Conclusions 

In this paper, we described an integrated image retrieval 
system. The system is based on the improved text*based 
and color-based image retrieval techniques. Our 
experimental results obtained from a database of 800 
unages show that the integrated system has higher, 
retrieval performance than the text based and the color 
based techniques. The system is also easy to use as it 



allows the user to start a query by typing in keywords; 
Once having queries by combining keywords and example 

images with varying weights. Another image analysis 
about object*s symmetry is also included in our image 
retrieval system. The system is stable, convenient and can 
be extended easily. The system realizes organization, 
management and retrieval of the image database under 
B/S mode. 
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Abstract — The Lorenz information measure 
(LIM) la a Ainction of the observed probability 
sequence of digital signals, similar to the slg* 
nal entropy, and is approximately linearly re- 
lated to the mean absolute error (MAE) in sim- 
ulations employing uncorrupted and corrupted 
2-dimensional gauasian and magnetic resonance 
(MR) Images. Unlike the MAE, the LIM does not 
require an uncorrupted reference slgoal for a dis- 
tance computation. However, for the particular 
difference signal case imposed by the definition 
of the MAE, the LIM is asymptotically bounded 
by the MAE/signal quantization number ratio. 
Therefore, In applications where an uncorrupted 
signal does not exist, and thus, the MAE is un- 
defined, the LIM provides a comparable signal 
processing performance measure. 

I. INTRODUCTION 

The mean absolute error (MAE) is a metric com- 
monly used in comparing signal processing algo- 
rithms. Accordingly, this computation requires an 
uncorrupted reference signal from which the distance 
to the observed signal is determined. However, if 
the reference signal does not exist or cannot be mea- 
sured, the MAE is undefined. Thus, an alternate 
measure for signal processing performance is neces- 
sary in the absence of a reference signal. The Lorenz 
information measure (LIM) is a function of the ob* 
served probability sequence of a digital signal, similar 
to the signal entropy, providing performance com- 
parable to the MAE without the requirement of an 
uncorrupted signal. 

A. Mean Absolute Error 

Consider subsets of the /* metric space consist- 
ing of finite sets of finite valued sequences a = 



(aoiai,...,a,„.i), representing the reference signal, 
and d = (do.aj, . . . ,am-i), representing the cor- 
rupted signal, with the convergence condition pro- 
vided by the finite 1^ norm and the distance given 
by [1-6]: 

ll(6-a)||,.= Xi|a..-ai| (1) 

where d = a -h i/ and u is the additive noise. As- 
suming that the originally real-valued sequences are 
uniformly quantized into at most n discrete levels for 
standard digital signal processing applications, the 
resulting digital signals may be scaled to represent 
arbitrary rational numbers and are, thus, considered 
integers to facilitate the analysis without loss of gen- 
erality. The MAE is the metric given by: 

l||(d-o)l|,.= ^X;|a<-«,| (2) 
m 171 



B. Lorenz Information Measure 

Continuing with this difference signal structure 
imposed by the definition of the MAE for the devel- 
opment of the Lorenz information measure, define 
the histogram h = (/Jo»^ii • • i/^o-i) m the num- 
ber of elements in |d - o| with the value j where 
0 < j < n - 1. Scaling the elements of h by m, 
the total number of elements in |d - a|, produces a 
probability sequence given by: 

mm m / 

= (POiPlf - tPn-l) 

0 < Pi < 1 V j 
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Fisure 1. A family of Lorenz curves arc tUustrated with a 
unifonnly distributed signal represented by the linear solid 
line and signals approaching a constant depicted by (he dashed 
Unes. 

equivalent to the sequence required to compute the 
signal entropy according to: 

^ = -EPiiog2(Pi) (4) 
j=o 

and the minimum quantization number 7" [6]. Ar- 
ranging the sequence in ascending order results 
in a monotonically nondecreasing sequence p^^ — 

(P(o),P(i). • • •.P(a-i)). such that p(o) < P(i) < 
• P(n-i)- Graphing the partial sums a =" 

(«(0),«(i). •.■.«(«)). given by: 

«(o) = 0 (5) 

Hit) = H'^O) » ^ 

as a piecewise linear function of ^ results in the fam- 
ily of Lorenz curves. A uniformly distributed se- 
quence where p^) =: ^ produces the linear Lorenz 
curve represented by the solid line in Figure 1. With 
decreasing uniformity, the signal approaches a con- 
stant with probability sequence element P(rt-i) = !• 
producing convex curves illustrated by the dashed 
tines of Figure 1. The LIM is defined by the area un- 
der the Lorenz curve above the ^ axis [7-10]. This 
measure is expressed by: 

/itW = ^(^ +P{n-2) + 2p(„«3)+ ... (6) 

+ (n - l)p(o)) 

= ^(^ + i:("-l-i)P(i)) 

Accordingly, constant signals result in the minimum 
value ^L/M = and uniformly distributed sig- 
nals produce the maximum value fiLiM — ^- Simi- 
larly, the MAE is expressed in terms of the unordered 



probability sequence p according to: 

fiMAB = Pi+2pa + ... + (n- l)p„-i (7) 

n-1 

= 

II. METHODS 

A theorem is presented providing a linear asymp- 
totic bound of the LIM by the MAE/signal quantiza- 
tion number ratio for the difference signal case. This 
theory motivates performing simulations producing 
results which, although not directly obtainable from 
the theorem, illustrate the utility of the LIM. The 
simulations consist of employing 2-dimensional gaus- 
sian and magnetic resonance (MR) images as un- 
corrupted references, and corrupting these images 
with approximately zero mean gaussian, uniform, 
and combined gaussian and uniform additive noise 
with various standard deviations. Mean and median 
filtering are performed on these corrupted images in 
order to demonstrate the effect of standard filtering 
operations on the LIM in quantifying filtering perfor- 
mance compared to the entropy and minimum quan- 
tization number. The LIM, entropy, and minimum 
quantization number are computed for each image 
and plotted as a function of the MAE. Accordingly, 
the LIM is demonstrated to vary approximately lin- 
early with the MAE for these simulations. 

III. RESULTS 

A. Theory 

The bound of the LIM presented below follows 
from a result in the rearrangement of finite sets of 
variables provided in reference [11] which is pre- 
sented without proof as a lemma. 

Lemma 1: Given a finite sequence with real val- 
ued| finite elements a = (oo,ai,. . .,an-i) with the 
rearrangement in ascending order denoted by a^-^ = 
(a(o)iO(i). ••.«(fi-i)) such that a(o) < a(i) <....,< 
a(«-i) 

Then: 

n-1 n-1 

X^(n-l-i)ao)<E>«i W 

In the general form of this lemma, the sum of the 
pairwise element products of two sequences is a min- 
imum when the elements are monotonic in opposite 
senses and a maximum when monotonic in the same 



sense. The LIM bound for the difference signal case 
u subsequently provided as a theorem. 
Theorem /; Given: 

i) The finite sequences a = (oo.ai, . ... ,a^_i) and 
d = (cio, ai , . . . , 1 ) with elements possessing 
at most n distinct finite integer values 

ii) The probability sequence p derived from the his- 
togram h of |d ^ a| 

iii) The Lorenz information measure ^ium ^nd the 
mean absolute error umae defined by: 

a— 1 

/iL/Af = ^(| + X)("-*-J>0)) 
n-l 

Then: 

ilLlM < ^(|iM^£ + |) (9) 

PToof : Rearranging Equation (6) results in: 

n-l 

n/iL/M - ^ = Y^(n - 1 - j)py) (10) 
i=o 

Since the sequences ((n — l),(n - 2), . . . , 1,0) and 
(P(0)iP(i)i - • iPcn-i)) are monotonic in opposite 
senses, applying Lemma 1 results in: 

n-l n-l 
/=0 ;=0 

Applying this inequality to Equation (10) and rear- 
ranging terms results in: 

A*I/M < ^((iMiiE + j) 

completing the proof. □ 
As a corollary, for increasingly large ^mae such 
that ^iMAE > \y then ^uM £ ^^^^ where < de- 
notes asymptotically less than or equal to resulting 
from this ^imab approximating condition. Accord- 
ingly, the LIM is asymptotically bounded by the ra- 
tio of the MAE and the signal quantization number. 

The above development considers probability se- 
quences constructed from the difference signal |d— a|. 
However, as discussed previously, the uncorrupted 
reference signal a is unknown in general, resulting 
in an undefined MAE. Alternatively, if the MAE is 
computed according to E)quation (7) assuming that p 



is simply the corrupted signal rather than the differ- 
ence signal probability sequence, the resulting mear 
sure is the absolute mean, or simply the mean, for 
strictly non-negative a. Clearly, the mean is not a 
sensitive performance measure for signals with addi- 
tive zero-mean noise processes. Conversely, the LIM 
is meaningfully defined for any discrete signal as a 
function of the probability sequence, independent of 
a reference signal, similar to H and 2^ . Of course, 
the domain (time/space) information of the MAE re- 
sulting from the diflerence signal structure of Equa- 
tion (2) is lost in each of the other measures. There- 
fore, since the above theoretical result based on dif- 
ference signals is suggestive of a relation to the MAE 
but not directly applicable to the general case where 
the reference signal does not exist, simulations are 
employed to demonstrate the utility of the LIM. Ac- 
cordingly, for the simulations, the MAE is computed 
using the reference signal as required, while the LIM, 
//, and 2" are evaluated from the corrupted signal 
only. 

B. Simulations 

Simulations are provided to demonstrate the 
use of the LIM in imaging applications. Uncor- 
rupted and corrupted 8 bit, 64 x 64 gaussian 
and 256 x 256 MR images and the filtered re- 
sults are generated. The LIM, minimum quanti- 
zation number, and entropy are computed exclu- 
sively from the corrupted images, while the MAE 
is computed from the corrupted/uncorrupted dif- 
ference images, as required by the definition. The 
uncorrupted image values are contained in the set 
(64,65,. . ., 192). Six corrupted gaussian images are 
generated with gaussian additive noise standard de- 
viations contained in (1.1,2.1,4.0,8.0,16.0,19.6). Sim- 
ilarly, the gaussian image is corrupted with uni- 
form and combined gaussian and uniform noise 
with standard deviations (1.1,2.0,4.0,8.0,16.0,36.6) 
and (0.8,1.5,2.8,5.6,11.3,20.9). respectively. Sub- 
sequently, another 18 corrupted MR images are 
generated with gaussian, uniform, and combined 
noise standard deviations (1.0,2.0,4.0,8.0,16.0,17.5), 
(1.0,2.0,4.0,8.0,16.0.36.4), and (0.7,1.4.2.8,5.7,11.3, 
20.2), respectively. Finally, mean and median filter- 
ing is performed and representative images are pre- 
sented in Figures 2-3. 

The computed MAE, LIM, and 2" values for the 
unfiltered and filtered corrupted images are plotted 
in Figures 4-5. The LIM is demonstrated to pro- 
vide slightly to significantly greater linearity with the 
MAE compared to 2^, according to the correlation 
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Figure 3. The uncomipted and selected corrupted gaussian 
images and the median and mean filtered results are displayed 
from top to bottom respectively with the corresponding MAC, 
LIM, and minimum quantisation number provided below. 

coefficient r^. Indeed, the graphs of 2" in Figures 4-5 
appear to have a slightly logarithmic functional form 
close to the 2" axis, similar to the obviously loga- 
rithmic functional form of the entropy graphs illus- 
trated in Figure 6. The linear correlation coefRcients 
of these entropy plots for the gaussian and MR im- 
ages are 0.834 and 0.757, respectively: significantly 
less than the logarithmic correlation coefficients of 
0.948 and 0.897. as presented in the figure. I'hus, 
in these simulations, the LIM is demonstrated to be 
more linear with the MAE for the unfiltered and fil- 
tered corrupted images than the minimum quantize^ 
tion number and the obviously logarithmic entropy. 
This approximately linear relation is consistent with 
the theoretical results. 

IV. DISCUSSION 

Statistical measures, such as the signal mean and 
standard deviation, are alternative quantities pro- 
viding signal analysis capability. However, for the 
standard zero mean noise process commonly used 
in modeling, the mean is unchanged between un- 
corrupted and corrupted signals, producing no in- 



Figure 3. The uncorrupted and uniform noise corrupted MR 
images and the median and mean filtered results are displayed 
from top left to bottom right respectively with the correspond- 
ing MAE. LIM. and minimum quantization number provided 
below. 

formation about the benefits of signal processing al- 
gorithms performed on the corrupted signal. In ad- 
dition, the standard deviation is observed also to ex- 
hibit less linearity with the MAE than the LIM in 
these simulations. As another alternative measure, 
the signal entropy is expected to be logarithmic in p 
by inspection. In order to suppress this logarithmic 
functional form, the minimum quantization number, 
representing the minimum amount of quantization 
levels required to represent the signal, is defined as 
2". As a result, this measure displays some linearity 
with the MAE in the simulations, although less than 
the LIM. This is consistent with the asymptotic re- 
sult of the corollary, where the MAE is much greater 
then \ . Accordingly, the LIM provides a similar per- 
formance measure to the MAE in the analysis of 
filtering operations, for example, without requiring 
knowledge of an uncorrupted signal as a target for 
comparison of filtering results. Thus, general signal 
processing operations may be quantified relative to 
the LIM without the often nonexistent uncorrupted 
reference signals necessary for MAE computation. 
Obviously, limitations exist in the interpretation 
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LIM = 7.3045C-2 + (6.4355e.3)MAE 
1^ = 0.991 
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MAE 

2H = 41,886 + (4.8024)MAE 
r2 = 0.960 



Fisure 4. The LIM and minimum qu&niizatioa number, from left to right, for the gaiuaion. uniform, and combined gausaian 
and uniform noite corrupted gauuian and the median and mean filtered images arc plotted aa a function of MAE with the 
linear regression equation and the correlation coefKdent provided below. 




10 



30 



20 
MAE 

LIM = 8.7405e-2 + (6.2976e-3)MAE 
r2 = 0.978 
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Figure 5. The LIM and minimum quantitation number, from left to right, for the gaussian, unifonn, and combined gauasian 
and uniform noiic corrupted MR and the median and mean filtered images are plotted as a function of MAE with the linear 
regression equation and the correlation coefficient provided below. 
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MAE MAE 

H = 5.5125 + 1.0203 log(MAE) H = 5.5863 + 1.1089 log(MAE) 

r2 s 0.948 = 0.897 

Figure 6. The entropy for the gaussian, uniform, and combineci gauisian and uniform noise corrupted gauacian, on the left, and 
MR, on the right, and the median and mean filtered images are plotted as a function of MAE with the logarithmic regression 
equation and the correlation coefficient provided below. 



of the LIM. For example, for signals which are ap- 
proximately uniformly distributed over the quanti- 
zation value set, the LIM approaches the maximum 
value of ^ , and addition of noise obviously increases 
the MAE without a similar increase in the LIM. This 
observation holds equally for both H and 2^^ . Thus, 
the LIM, as well as H and 2" , are more meaning- 
ful when applied to signals with signilicaDtly nonuni- 
formly distributed histograms, which are more typ- 
ically observed in actual signals acquired from real 
sources. 

V. Conclusion 

The LIM is demonstrated to be approximately 
linearly related to the MAE. The theoretical LIM 
bound by the MAE/signal quantization number ra- 
tio for the difference signal case motivates the simu- 
lations which illustrate this property for unaltered 
and mean and median filtered gaussian, uniform, 
and combined noise corrupted gaussian and MR im- 
ages. In this investigation, the LIM demonstrates 
greater linearity with the MAE than the entropy and 
the minimum quantization number. Thus, the LIM 
varys approximately linearly with the MAE provid- 
ing a similar performance measure without requiring 
an uncorrupted reference signal for computation. 



REFERENCES 

(ll A. N. Kolmogorov, S. V. Fomin, lntTX>d%ct0rp Real AnaU 
ysit, Dover Publications. Inc., New York, NY, 1970. 

[3] I. Daubechtes, Tea Ltcturet o« WwUU, Society for 
Industrial and Applied Mathematics, Philadelphia, PA, 
1992. 

[3] A. M. Pinkus, On 'Approrimation, Cambridge Uni- 
venity Press, Cambridge, UK, 1989. 

(4) A. Torchinsky. Real Vartsi/e*, Addison- Wesley PubUsh- 
ing Company, Redwood City, CA, 1988. 

[5) Y. Dodge, Editor, SiaitiUeai Dcta Analy»i$ Based on ihc 
Li-narm and Related Methode, Elsevier Sdenoe Publish- 
ing Company, New York. NY. 1987. 

[6] R. M. CapoceUi, A. De Santis, L J. Tancja, "Bounds on 
the Entropy Series", IEEE Tnn*acU9n* on /n/ormalion 
Theorf, Volume 34. Number I, January, 1988. Pages 
134-138. 

(7] M. O. Lorenx, ''Methods of Measuring the Concentration 
of Wealth", Pnblicaitone of the American StaiUiicat Am- 
eociaiion. Volume 9, 1905. Pages 209-219. 

(8| A. W. Marshall, 1. Olkin. /ncfmsfcltcs: T&eory o/ Ma- 
joriiation and lU 4pp/i'csfton«, Academic Press. New 
Yorit, NY. 1979. 

(9] B. C. Arnold, Wsjoniolioo and ike Lorenx Order: A 
Brief Inirodnciion^ Springer- Verlag, Berlin, Germany, 
1987. 

[10] S. K. Chang, Principles of Pictorial information 5|«- 
ttmi Design, Prentice Hall, Engle wood Cliffs, NJ, 1989, 

(11] G. H. Hardy, J. C. Littlewood. G. Polya, rncq%alitie$ , 
7^ Edition, Cambridge University Press, London. Eng- 
land, 1959. 



29 




Histogram Refinement for Content-Based Image Retrieval 

Greg Pass Ramin Zabih* 

Computer Science Department 
Cornell University 
Ithaca, NY 14853 
gregpasSjrdzQcs.cornelI.edu 
http://www.cs.comell.edu/home/rdz/refinement.html 



Abstract 

ColoT histoprams are widely used for conienUbas ed 
ima^ rp.tripvaL^ Their advantages are efficiencv. and 

^irisensitivity to small changes in camera viexupoint. 
However, a histogram is a coarse characterization of 
an image, and so images with very different appear- 
ances can have similar histograms. jVe describe^ a 
technique for cc fm paring irjuiges cojieJ^istogramTe - 

lineTnentjjphich~tmpose s adiiittonal c onstraints on his- 
t ogram based matc hing. "Histogram refinement splits 
tne pixels in a given bucket into several classes, based 
upon some local property. Within a given bucket, only 
pixels in the same class are compared. We describe a 
split histogram called a color coherence vector (CCV), 
which partitions each histogram bucket based on spa- 
tial coherence. CCV's can be computed at over 5 im- 
ages per second on a standard workstation. A database 
unih 15,000 images can be queried using CCV's in un- 
der 2 seconds. We demonstrate that histogram refine- 
ment can he used to distinguish images whose color 
histograms are indistinguishable. 

1 Introduction 

Many applications require methods for comparing 
images based on their overall appearance. Color his- 
tograms are a popular solution to this problem, and 
are used in systems like QBIC [2] and Chabot [6]. 
Color histograms are computationally efficient, and 
generally insensitive to small changes in camera po- 
sition. However, a color histogram provides only a 
very coarse characterization of an image; images with 
similar histograms can have dramatically different ap- 
pearances. For example, the images shown in figure 1 
have simUar color histograms. 

In this paper we describe a method which imposes 
additional constraints on histogram bcised matching. 
In histogram refinemenij the pixels within a given 
bucJcet are split into classes based upon some local 
property. Split histograms are compared on a bucket 
by bucket basis, similar to standard histogram match- 
ing. Within a given bucket, only pixels with the same 
property are compared. Two images with identical 
color histograjns can have different split histograms; 



thus, split histograms create a finer distinction than 
color histograms. This is particularly important for 
large image databases, in which many images can have 
similar color histograms. 
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Figure 1: Two images with similar color histograms 

We have experimented with a split histogram caUed 
a color coherence vector (CCV), which partitions pix- 
els based upon their spatial coherence. A coherent 
pixel is part of some sizable contiguous region, while 
an Incoherent pixel is not. While the two images 
shown in figure 1 have similar color histograms, their 
CCV's are very different.* For example, red pixels 
appear in both images in similar quantities. In the 
left image the red pucels (from the flowers) are widely 
scattered, while in the right image the red pixels (from 
the golfer's shirt) form a single coherent region. 

we begin with a review of color histograms. In sec- 
tion 3 we describe histogram refinement, and present 
two examples that capture spatial information. Sec- 
tion 4 provides examples of refinement-based image 
queries and shows that they can give superior results 
to color histograms. We compare our work with some 
recent algorithms (5, 8, 9, 10] that also combine spatial 
information with color histograms. 

2 Color Histograms 

Color histograms are frequently used to compare 
images. Examples of their use in multimedia appli- 
cations include scene break detection and querying a 
database of images [7, 6, 2|. Color histograms are pop- 
ular because they are trivial to compute, and tend to 

^The color im&ges ased in this paper c&o be found at 
http://www.cs.cornell.edo/home/rdx/refineinent.html. 
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be robust against small changes in camera viewpoint. 
FoT example, Swain and Ballard [12] describe the use 
of color histograms for identifying objects. Strieker 
and Swain [11] analyze the information capacity of 
color histograms. 

/We will assume that all images are scaled to con- 
Bln the same number of pixels M. We discretize the 
olorspace of the image such that there are n distinct 
iscretized) colors. A color histoeram 1/ is a vector 
Ai, • • • » f^n)t in which each bucket hj contains the 
number of pixels of color j in the image. Typically 
mages are represented in the RGB colorspace, with a 
few of the most significant bits per color channel. 
- For a given image /, the color histogram ff / is a 
compact summary of the image. A database of im- 
ages can be queried to find the most similar image to 
/, and can return the image /' with the most similar 
color histogram Hj'. Color histograms are typically 
compared using the Li -distance or the L2-distance, 
although more complex measures have also been con- 
sidered [4]. 

3 Histogram Refinement 

In histogram refinement the pixels of a given bucket 
are subdivided into classes based on local features. 
There are many possible features, including texture, 
orientation, distance from the nearest edge, relative 
brightness, etc. Histogram refinement prevents pixels 
in the same bucket from matching each other if they 
do not fall into the same class. Pixels in the same class 
can be compared using any standard method for com- 
paring histogram buckets (such as the Li distance). 
This allows fine distinctions that cannot be made with 
color histograms. 

As a simple example of histogram refinement, con- 
sider a positional refinement where each pixel in a 
given color bucket is classified as either "in the cen- 
ter" of the image, or not. Specifically, the centermost 
75% of the pixels are defined as the "center". This 
produces a split histogram in which the pixels of color 
buckets are loosely constrained by their location in 
the image. The resulting split histograms can be com- 
pared using the L\ distance. We will call this simple 
fni^f )iiRtngra.m refinftmfiTit centeriny refinement. 

(Sdlor coherence vectors 

CCV's are a more sophisticated form of histogram 
refinement, in which histogram buckets are partitioned 
based on spatial coherence. Our coherence measure 
classifies pixels as either coherent or incoherent. A 
coherent pixel is a part of a sizable contiguous region, 
whOe an incoherent pixel is not, A color coherence 
vector represents this classification for each color in 
the image. 

The initial stage in computing a CCV is similar to 
the computation of a color histogram. We first blur 
the image slightly by replacing pixel values with the 
average value in a small local neighborhood (currently 
including the 8 adjacent pbcels). We then discretize 
the colorspace, such that there are only n distinct col- 
ors in the image. 

The next step is to classify the pixels within a given 
color bucket as either coherent or incoherent. A coher- 
ent pixel is part of a large group of pixels of the same 



color, while an incoherent pixel is not. We determine 
the pixel groups by computing connected components. 
A connected component C is a maximal set of pixels 
such that for any two pixels p.p' € C, there is a path 
in C between p and p'. We compute connected com- 
ponents using 4-connected neighbors within a ^ven 
discretized color bucket. We classify pixels as either 
coherent or incoherent depending on the size in pbcels 
of its connected component. A pixel is coherent if the 
size of its connected component exceeds a fixed value 
r; otherwise, the pixel is incoherent. 

For a given discretized color, some of the pixels 
with that color will be coherent and some will be in- 
coherent. Let us call the number of coherent pixels of 
the j*th discretized color aj and the number of inco- 
herent pixels Pj. Clearly, the total number of pixels 
with that color is Qy -h /?; , and so a color histogram 
would summarize an image as (ai -f i3i , . . . , ftn + Pn) ■ 
Instead, for each color we compute the pair {aj.Pj) 
which we will call the coherence pair for the ;*tn color. 
The color coherence vector for the image consists of 
{(cti , /?i ),..., {q„, /?n)} . This is a vector of coherence 
pairs, one for each discretized color. 

In our experiments, all images were scaled to con- 
tain M = 38,976 pixels, and we have used r = 300 
pixels (so a region is classified as coherent if its area is 
about 1% of the image). With this value of r, an av- 
erage image in our database consists of approximatdy 
75% coherent pixels, with a standard deviation of 11%. 

Two images /,/' can be compared using their 
CCV's, for example by using the Li distance. Let the 
coherence pairs for the j*th color bucket be (a,-,/3j) in 
/ and (a^-,/?j) in /'. Using the Li distance to compare 
CCV's, the ;'th bucket's contribution to the distance 
between I and /' is 

Accv = |(a,-a;)| + l(/Ji-^;)|. (1) 

Note that when using color histograms to compare / 
and /', the ;'th bucket's contribution is 

AcH = |(ai + i3i)-(a;-h/j;)|. (2) 

It follows that CCV's create a finer distinction than 
color histograms. A ^ven color bucket j can contain 
the same number of pixels in / as in but these pixels 
may be entirely incoherent in / and entirely coherent 
in /' (i.e., a = 0). Formally, Ach < Accv 
follows from equations 1 and 2, and the fact that the 
Li distance obeys the triangle inequality. 

4 Experimental Resxilts 

We have implemented histogram refinement, and 
have used it for image retrieval from a large database. 
Our database consists of 14,554 images, which are 
drawn from a variety of sources. Our largest sources 
include the 11,667 imaees used in Chabot [6], the 1,440 
images used in QBIC |2], and a 1,005 image database 
available from Corel. In addition, we included a few 
groups of images in PhotoCD format. Finally, we have 
taken a number of MPEG videos from the Web and 
segmented them into scenes. We have added one or 
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two images from each scene to the database, totaling 
349 images. The image database thus contains a wide 
variety of imagery. 

We have compared our results with a number of 
color histogram variants. These include the Li and 
L2 distances, with both 64 and 512 color buckets. 
We include a small amount of smoothing as it em- 
pirically improved performance. On our database, the 
Li distance with the 64'bucket RGB colorspace gave 
the best results, and is used as a benchmark. 

Hand examination of our database revealed 75 
pairs of images which contain different views of 
the same scene. Examples are shown in figures 2 
and 3. One image is selected as a query image, 
and the other represents a "correct" answer. In each 
case, we have shown where the second image ranks, 
when similarity is computed using color histograms 
or using histogram refinement. Specifically, results 
are shown using CCV's, centering refinement, and 
a successive refinement technique described in sec- 
tion 6.1. The color images shown are available at 
http: / /wwwxs.cornell.edu/home/ rdz/refinement.html. 

4.1 Centering refinement results 

In 69 of the 75 cases, centering refinement pro- 
duced better results, while in 4 cases it produced worse 
results (there were 2 cases where the ranks did not 
change). The average change in rank due to center- 
ing refinement was aji improvement of 55 positions 
{this included all 75 cases). The average percentage 
change in rank was an improvement of 41%. In the 
69 cases where centering refinement performed bet- 
ter than color histograms, the average improvement 
in rank was 60 positions, and the average percentage 
improvement was 49%. In the 4 cases where color his- 
tograms performed better than centering refinement, 
the average rank improvement was 10 positions. We 
have not yet analyzed these 4 cases to determine why 
centering refinement fails. 

To analyze, the statistical significance of this data, 
we formulate the null hypothesis ^0 which states that 
centering refinement is equally likely to cause a posi- 
tive change in ranks (i.e., an improvement) or a neg- 
ative change. We will discard the 2 ties to simplity 
the an^ysis. Under Hq^ the expected number of pos- 
itive changes is 36.5, with a standard deviation of 
a/73/2 » 4.27. The actual number of positive changes 
is 69, which is more than 7.6 standard deviations 
greater than the number expected under Fq. We can 
therefore reject Ho at any standard significance level 
(such as 99.9%). 

4.2 CCV results 

In 68 of the 75 cases, CCV's produced better re- 
sults, while in 7 cases they produced worse results. 
The average change in rank due to CCV's was an im- 
provement of 68 positions (note that this included the 
7 cases where CCV's do worse). The average percent- 
age change in rank was an improvement of 359b. In the 
68 cases where CCV*s performed better than color his- 
tograms, the average improvement in rank was 77 po- 
sitions, and the average percentage improvement was 
56%. In the 7 cases where color histograms performed 
better, the average improvement was 17 positions. 



The null hypothesis Hq states that CCV's are 
equally likely to cause a positive change in ranks (i.e., 
an improvement) or a negative change. Under, ffoi 
the expected number of positive changes is 37,5, with 
a standard delation of ^/H/2 » 4.33. The: actual 
number of positive changes is 68, which is more than 7 
standard deviations greater than the number expected 
under Hq, We can therefore reject Ho at any standard 
significance level (such as 99.9%). 

When CCV's produced worse results, it was always 
due to a change in overall image brightness (i.e., the 
two images were almost identical, except that one was 
brighter than the other). Because CCV's use dis- 
cretized color buckets for segmentation, they are more 
sensitive to changes in overall image brightness than 
color histograms. We believe that this difficulty can 
be overcome by using a better colorspace than RGB, 
as we discuss in section 6.2. 

4.3 Efficiency 

We have experimented with a number of different 
techniques for histogram refinement. CCV's are the 
most computationadly expensive method of these, and 
will be our focus in discussing efficiency. 

There are two phases to the computation involved 
in querying an image database. First, when an im- 
age is inserted into the database, a CCV must be 
computed. Second, when the database, is queried, 
some number of the most similar images must be re- 
trieved. Most methods for content-based indexing in- 
clude these distinct phases. For both color histograms 
and CCV*s, these phases can be implemented in linear 
time with a single pass over the image. 

We ran our experiments on a 50 MHz SPARCsta- 
tion 20, and provide the results froni color histogram- 
ming for comparison. Color histograms can be com- 
puted at 67 images per second, while CCV's can be 
computed at 5 images per second. Using color his- 
tograms, 21,940 comparisons can be performed per 
second, while with CCV's 7,746 can be performed 
per second. The images used for benchmarking are 
232 X 168. Both implementations are preliminary, and 
the performance can definitely be improved. 

5 Related Work 

Our work has focused on the use of spatial infor- 
mation to refine color histograms: Recently, several 
authors have proposed algonthms for comparing im- 
ages that combine spatial information with color his- 
tograms. Hsu ei al. [5[ attempts to capture the spatial 
arrangement of the different colors in the image, in 
order to perform more accurate content-based image 
retrieval. Rickman and Stonham [8] randomly sam- 
ple the endpoints of small triangles and compare the 
distributions of these triplets. Smith and Chang [9] 
concentrate on queries that combine spatial informa- 
tion with color. Strieker and Dimai [10] divide the 
image into five partially overlapping regions and com- 
pute the first three moments of the color distributions 
in each image. We will discuss each approach in turn. 

Hsu (5] begins by selecting a set of representative 
colors from the image. Next, the image is partitioned 
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Histogram: 198. Centering refinement: 42, CCV: 33. Successive refinement: 6. 




Histogram: 78. Centering refinement: 54. CCV: 12. Successive refinement: 7. 
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Figure 2: Example queries with their partner images, plus ranks under various methods. Lower ranks indicate 
better performance. 
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Histogram: 88. Centering refinement: 35. CCV: 20. Successive refinement: 13. 





Histogram: 310. Centering refinement: 214. CCV: 177. Successive refinement: 160. 




Histogram: 411. Centering refinement: 282. CCV: 84. Successive refinement; 56. 



Histogram: 50. Centering refinement: 37. CCV: 27. Successive refinement: 22. 
Figure 3: Additional example queries with ranks. Lower ranks indicate better performance. 
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into rectangular regions, where each region is pre- 
donunantly a single color. The partitioning algorithm 
makes use of maximum entropy. The similarity be- 
tween two images is the degree of overlap between re- 

5 ions of the same color. Hsu presents results from a 
atabase with 260 images, which show that their ap- 
proach can give better results than color histograms. 

While the authors do not report running times, 
it appears that Hsu's method requires substantially 
more computation than the approach we describe. A 
CCV can be computed in a single pass over the image, 
with a small number of operations per pixel. Hsu's 
partitioning algorithm in particular appears much 
more computationally intensive than our method. 
Hsu's approach can be extended to be independent 
of orientation and position, but the computation in- 
volved is quite substantial. In contrast, our method is 
naturally invariant to orientation and position. 

Rickman and Stonham [8] randomly sample pixel 
triples arranged in an equilateral triangle with a fixed 
side length. They use 16 levels of color hue, with non- 
uniform quantization. Approximately a quarter of the 
pixels are selected for sampling, and their method 
stores 372 bits per image. They report results from 
a database of 100 images. 

Smith and Chang's algorithm also partitions the 
image into regions, but their approach is more elabo- 
rate than Hsu's. They allow a region to contain multi- 
ple different colors, and permit a given pixel to belong 
to several different regions. Their computation makes 
use of histogram back-projection [12] to back-project 
sets of colors onto the image. They then identify color 
sets with large connected components. 

Smith and Chang's image database contains 3,100 
images. Again, running times are not reported, al- 
though their algorithm does speed up back-projection 
queries by pre-computing the back- projections of cer- 
tain color sets. Their algorithm can also handle cer- 
tain kinds of queries that our work does not address; 
for example, they can find all the images where the 
sun is setting in the upper left part of the image. 

Strieker and Dimai [10] compute moments for each 
channel in the HSV colorspace, where pixels close to 
the border have less weight. They store 45 floating 
point numbers per image. Their distance measure for 
two regions is a weighted sum of the differences in 
each of the three moments. The distance measure 
for a pair of images is the sum of the distance be- 
tween the center regions, plus (for each of the 4 side 
regions) the minimum distance of that region to the 
corresponding region in the other image, when rotated 
by 0, 90, 180 or 270 degrees. Because the regions over- 
lap, their method is insensitive to small rotations or 
translations. Because they explicitly handle rotations 
of 0, 90, 180 or 270 degrees, their method is not af- 
fected by these particular rotations. Their database 
contains over 11,000 images, but the performance of 
their method is only illustrated on 3 example queries. 
Like Smith and Chang, their method is designed to 
handle certain kinds of more complex queries that we 
do not consider. 



6 Extensions 

There axe a number of ways in which our histogram 
refinement could be extended and improved. One 
generalbation is to further subdivide split histograms 
based on additional features; we refer to thb process 
as successive refinement Another extension centers 
on improving the dioice of colorspace. 

6.1 Successive refinement 

In successive refinement the buckets in a split his- 
togram are further subdivided based on additional fea- 
tures. Much as we distinguish between pixels of sim- 
ilar color by coherence, we can distinguish between 
pbcels of siinilar coherence by some additional feature. 
We can apply this method repeatedly; each refinement 
imposes an additional constraint on what it means for 
two pixels to be similar. 

We have implemented a simple successively refined 
histogram. A color histogram was first split with co- 
herence constraints (to create a CCV). Successive re- 
finement was enforced on both the coherent and in- 
coherent pixels of the CCV. We used the centering 
refinement introduced in section 3. With successive 
refinement, pixels are divided into four classes based 
on coherence versus Incoherence, and on whether or 
not they were in the centermost 75% of the image. 
The Li distance was used as a comparison measure. 
Examples of the successively refined histogram's per- 
formance are shown in figures 2 and 3. These prelim- 
inary results seem promising. 

We have also investigated successive refinement 
based on intensity gradients. Again, the initial re- 
finement was based on coherence, and the successive 
refinement was enforced identically on coherent and 
incoherent pixels. We have further classified pixels 
based on the gradient magnitude or on the gradient 
direction. The results we obtained are quite prelimi- 
nary, but they seem to indicate a statistically signifi- 
cant improvement over CCV's. 

The best system of constraints to impose on the 
image is an open issue. Any combination of features 
might give effective results, and there are many possi- 
ble features to choose from. However, it is possible to 
take advantage of the temporal structure of a succes- 
sively refined histogram. One feature might serve as 
a filter for another feature, by ensuring that the sec- 
ond feature is only computed on pixels which already 
possess the first feature. 

For example, the perimeter-to-area ratio can be 
used to classify the relative shapes of color regions. 
If we used this ratio as an initial refinement on color 
histograms, incoherent pixels would result in statisti- 
cal outliers, and thus give questionable results. This 
feature is better employed after the coherent pixels 
have been se^egated. Refining a histogram not only 
makes finer distinctions between pixels, but functions 
as a statistical filter for successive refinements. 

6.2 Choice of colorspace 

Many researchers spend considerable effort on se- 
lecting a good set of colors. Hsu [5], for example, 
assumes that the colors in the center of the image 
are more important than those at the periphery, while 
Smith and Chang (9] use several different thresholds to 
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extract colors and regions. A wide variety of different 
colorspaces have also been investigated for content- 
based image retrieval, such as the opponent-axis col- 
orspace [121 and the Munsell colorspace [2]. 

The choice of colorspace is a particularly signifi- 
cant issue for CCV*s, since they use the discretized 
color buckets to segment the image. A perceptually 
uniform colorspace, such as CIE Lab, should result in 
better segmentations and improve the performance of 
CCV's. A related issue is the color constancy prob- 
lem, which causes objects of the same color to ap- 
pear rather differently depending upon the lighting 
conditions. The simplest effect of color constancy is a 
change in overall image brightness; this is responsible 
for the negative examples obtadned in our experiments 
with CCV's. Standard histogramming methods are 
sensitive to image gain. More sophisticated methods, 
such as color ratio histograms [3] or the use of color 
moments [10], might alleviate this problem. These 
methods, like most proposed improvements to color 
histograms, can also be used in histogram refinement. 
For example, color moments could be computed sepa- 
rately for coherent and incoherent pixels. 

7 Conclusions 

We have described a method for imposing addi- 
tional constraints on histopatm based matching called 
histogram refinement. This idea can be extended by 
placing further constraints on the split histogram it- 
self. Both histogram refinement and successive refine- 
ment are general methods for improving the perfor- 
mance of histogram based matching. If the initial his- 
togram is a color histogram, and it is refined based 
on coherence, then the resulting split histogram is a 
CCV. But there is no requirement that this refine- 
ment be based on coherence, or even that the initial 
histogram be based on color. 

Most research in content-based image retrieval has 
focused on query by example (where the system au- 
tomatically finds images similar to an input image). 
However, other types of queries are also important. 
For example, it is often useful to search for imiiges 
in which a subset of another image (e.g. a particu- 
lar object) appears. This would be particularly useful 
for queries on a database of videos. One approach to 
this problem might be to generalize histogram back- 
projection [12] to separate pixels based on spatial co- 
herence, or some other local property. 

It. is dear that larger and larger image databases 
will demand more complex similarity measures. This 
added time complenty can be offset by usine efiicient, 
coaxse measures that prune the search space by remov- 
ing images which are clearly not the desired answer. 
Measures which are less efficient but more effective can 
then be applied to the remaining images. Baker and 
Nayar [1] have begun to investigate similar ideas for 
pattern recognition problems, xo effectively handle 
large image databases will require a balance between 
increasingly fine measures (such as histogram refine- 
ment) and efficient coarse measures. 
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Abstract 

Applying information retrieval techniques to the 
World Wide Web (WWW) environment is a unique 
challenge, mostly because of its hypertext/hypermedia 
nature and the richness of the meta-in formation it 
provides. ' We present four keyword-based search and 
ranking algorithms for locating relevant WWW pages 
with respect to user queries. The first algorithm, 
Boolean Spreading Activaiionf extends the notion of 
word occurrence in Boolean retrieval model by prop- 
agating the occurrence of a query word in a page to 
other pages linked to it. The second algorithm. Most- 
cited, uses the number of citing hyperlinks between po- 
tentially relevant WWW pages to increase the rele- 
vance scores of the referenced pages over the referenc- 
ing pages. The third algorithm, TFxIDF vector space 
model, is based on word distribution statistics. The 
last algorithm. Vector Spreading Activation, combines 
TFxIDF with the spreading activation model. We con- 
ducted an experiment to evaluate the retrieval effec- 
tiveness of these algorithms. From the results of the 
experiment, we draw conclusions regarding the nature 
of the WWW environment with respect to document 
ranking strategies. 

1 Introduction 

The World Wide Web fWWW) [4] haa become one 
of the fastest growing applications on the Internet to- 
day. Its popularity can be attributed mainly to its 
uniform access method for various network informa- 
tion services and its hypermedia support which links 
a wide range of multimedia data physiceJly distributed 
all around the world into a single gigantic virtual 
database. WWW also provides a powerful and easy 
to setup medium for almost any user oq the Internet 
to disseminate information. More and more informa- 
tion has become available online through WWW, from 
personal data to scientific reports to up-to-the-minute 
satellite images. This information explosion leads to a 
problem commonly known as resource discovery prob- 
lem [14]. In order to find interesting WWW pages, a 
user has to browse through many WWW sites. This 



is a very time consuming process. 

Methods to relief the users from this information 
overflow problem have been explored by others, from 
creating a special Usenet [8] newsgroup^ for announc- 
ing new WWW sites, to snaring personal hot lists ^ac- 
cessible from the owner's home pages), compiled lists 
and catalogs, to searchable full-text index databases. 
In this paper, we present a WWW index server de- 
signed to help users locate WWW pages using key- 
word search. Based on how the index is built, there 
are two categories of WWW searchable index servers, 
namely manually generated index servers and robot 
generated index servers. 

Among the well known manually built index servers 
are Yahoo^ and EInet Galaxy.^ The main advantage 
of manual indexing is that Web pages can be orga- 
nized, hierarchically or otherwise, by subject, such as 
the subject tree structure in Yahoo and EInet Galaxy. 
Of course, such categorizations are subjective and may 
be biased to the maintainer's knowledge and back- 
ground. A slightly different scheme of manually gen- 
erated index system is the one used by the Global 
Online Directory (GOLD^), among others, which al- 
lows any user to add an entry ^a pointer to a Web page 
along with other information) mto the index database. 
Similar to the above scheme is that of Archie-like 
Web server f ALIWEB*) [7]. Instead of users regis- 
tering their WWW pages, ALIWEB retrieves index 
data from each of the participating WWW servers. 
This index data is prepared manually by the respective 
WWW server mamtainers in a standard text format 
containing the description of information provided by 
the servers. 

Our index server falls into the category of robot- 
generated index servers. Robot-based indexing is 
faster and more comprehensive than manual index- 



* comp.inJoMy stems. www. announce newsgroup. 
' {http://www.yahoo,com/>. 

^ (http://galaxy.cinct.net/>. 

* {http://www.gold.net/gold/). 

' {http:/ /web.nexor.co.uk/public/aljweb/aliweb.html} . 
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Figure 1: Multiserver organization and query routing. 



ing and, since it is automated, it is easy to update the 
index as often as necessary. We discuss some of the 
robot^based index servers on the Internet in the last 
section of this paper. 

Our WWW index server^ takes a query from a user 
and returns a list of URL*s f Uniform Resource Loca- 
tors [1] or WWW page adoresses) along with their 
titles, ranked by relevance score. Hyphens may be 
used to specify phrases so that the search algorithm 
only searches for occurrences of words in the same 
word ordering as they are in the phrase. For exam- 
ple, the query "computer-science" matches only pages 
containing the word "computer" immediately followed 
by the word "science" . Our index server also allows a 
user to save a query, along with an optional single-line 
comment, on the server so that other users can share 
his/her discovery. Saved queries are stored in a list 
of clickable query statements. By clicking on one of 
these statements, a user can resubmit the query to the 
index server. 

The index server is a key component of the Dis- 
tributed World Wide Web Index and Search Engine 
(D-WISE) project, which is being conducted at the 
Hong Kong University of Science and Technology. Fig- 
ure 1 illustrates the global architecture of D-WISE, 
where an index server covers a number of WWW sites 
belonging to a ^oup based on geographical location, 
institution or other categories. For instance, the index 
server currently operational covers most of the sites in 
Hong Kong; similar index servers may be developed for 
a particular institution (e.g., covering all of the NASA 
sites). Each of such servers reports to one or more spe- 
cial servers, called the brokers. A broker maintains a 
catalog of meta-information which describes the top- 
ics covered by each of the index servers. A user client 
can send a query to one of the brokers. The broker 
then redirects the query to the index server which can 
potentially provide the best answer. This approach 
is similar to that of other server indexing methods 

•The www index oerver currently coven mci WWW 
•ervers in Hong Kong, and is publicly accessible at 
(http://www.cs.ust.hk/cgi-bin/IadexServer). 
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such as GLOSS [6]. The emphasis of D-WISE is more 
on the schemes for meta^data exchange between index 
servers and brokers, and for automatic index server 
categorization. The detail of D-WISE is beyond the 
scope of this paper. In this paper, we describe the de- 
sign and implementation of WISE, the index server, 
as a stand-alone system. 

It is worth noting that our Koal is not to replace 
browsing with keyword search, but to supplement it. 
We believe that browsing is an intuitive and appeal- 
ing paradigm for accessmg information. However, 
the search paradigm can bring the user closer to po- 
tentially relevant sites or pages quickly, from which 
browsing can be used to further explore interesting 
sites or pages in the neighborhood. The objective of 
this paper is to explore and evaluate several different 
seardi strategies, mcluding new ones which are de- 
signed to take advantage of hyperlink and other meta- 
information specific to the WWW environment. 

The test of the report b organized as follows. In 
section 2, we describe the design of our index server 
in general. Section 3 presents the search algorithms 
in detsdl. Section 4 discusses the experiment we con- 
ducted to evaluate the search algorithms. Finally, in 
section 5 we discuss the conclusions drawn from the 
results of the experiment, plans for future study, and 
some comparisons with other WWW index servers. 

2 System Description 

Our WWW index server consists of two main com- 
ponents: an indexer robot and a search engine. Figure 
2 illustrates the system architecture. 

2.1 Indexer Robot 

The indexer robot is an autonomous WWW 
browser which communicates with WWW servers us- 
ing HTTP (Hypertext TVansfer Protocol [2]). It visits 
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a given WWW site, traverses hyperlinks in a breadth- 
first manner, retrieves WWW pages, extracts key- 
words and hyperlink data from the pages, and inserts 
the keywords and hyperlink data into an index. 

The index consists of a page-ID table, a keyword-ID 
table, a page-title table, a page modification-date ta- 
ble, a hyperlink table, and two index tables, namelyj 
an inverted index and a forward index. The page- 
ID table maps each URL to a unique page-ID. The 
keyword-ID table maps each keyword to a unique 
keyword-ID. The page-title table maps every page-ID 
to the pagers title. The page modification-date table 
maps every page- ID to the date when the page was 
last visited by the indexer robot. The hyperlink ta- 
ble maps each page-ID with two arrays of page-ID*s, 
one array representing a list of pages referencing the 
page (incoming hyperlinks) and the other array repre- 
senting a list of pages referenced by the page (outgo- 
ing hyperlinks). The inverted index table maps each 
keyword-ID to an array of (page-ID, word-position) 
pairs, each of which represents a page containing the 
keyword and the position of the word (order num- 
ber) in the page. This word-position information is 
used in phrase searches mentioned in the introduc- 
tion. Such information may also be useful for search 
strategies which take into account the distances be- 
tween keywords. In this study, we do not investigate 
such strategies. The forward-index table maps a page- 
ID to an array of keyword-ID *s representing keywords 
contained in the page. To obtain a fast access speed, 
hashing method is used to index each of these tables 
on the page-ID or keyword-ID attribute. The decision 
to store the index in separate tables instead of two 
large tables, one indexed by page-ID and the other in- 
dexed by keyword-ID, was based on its ease of main- 
tenance and modularity. Moreover, the objective of 
this project is to develop algorithms that would work 
best in the WWW environment in terms of precision 
and recall. Thus, efliciency is not a primary concern 
at this stage of the project. 

In extracting the keywords, we exclude high- 
frequency function words (stop- words), numbers, com- 
puter specific identifiers such as file-names, file direc- 
tory paths, email addresses, network host names, and 
HTML (Hypertext Markup Language [3]) tags. To 
reduce storage overhead, the indexer robot only in- 
dexes words enclosed by HTML tags indicating tokens 
such as page titles, headings, hyperlink anchor names, 
words in bold-face, words in italic, and words in the 
first sentence of every list item. We assume that a 
WWW author will use these tags only on important 
sentences or words in his/her WWW pages. Thus, 
these words make good page identifiers. This is one of 
the advantages of adopting SGML (Standard General 
Markup Language), of which HTML is a subset. Of 
course, there may be other important words which are 
not enclosed by any of the above HTML tags. Words 
chosen as keywords are then stemmed by removing 
their suffices. 

Resources using protocols other than HTTP (FTP, 
Gopher, Telnet, etc.) or in formats other than HTML 
text file (non-inline image, sound, video, binary, and 



other text files), click-able image maps, and CGI 
scripts are indexed by the anchor texts referencing 
them. 

Periodic maintenance of the index files is performed 
bi-weekly by the indexer robot. First, the indexer 
robot checks the validity of every URL entry in the 
database by sending a special request to the WWW 
server containing the page to check whether the page 
has been modified since the time it was last accessed 
by the indexer robot. This special request, known as 
HEAD request, is a feature supported by HTTP. Non- 
routine index maintenance is also supported. This is 
performed at night in response to user requests re- 
ceived during the day to index new pages (URL's) or 
re-index updated pages. Such user requests are fa- 
cilitated by an electronic form provided by the index 
server. 

Our indexer robot has the capability of detecting 
looping paths, e.g., those caused by Unix symbolic 
links, and does not visit the same page more than once 
unless the page has been modified since the time it was 
last visited by the robot. The latter is made possible 
by supplying the last access date and time into the 
HTTP request.^ As specified in the HTTP specifi- 
cation [2), the remote WWW server will not send the 
page content in response to the request if the page has 
not been modified since the specified time. Further- 
more, the robot will not even send an HTTP request 
if the page was last accessed within the last 24 hours. 
This is to prevent the robot from sending more than 
one HTTP requests for the same page during a main- 
tenance batch. To prevent the robot from endlessly 
roaming around from one server to the next, the robot 
accesses page at one site at a time and only references 
within the same site domain as that of the referenc- 
ing page are traversed. Finally, the robot supports the 
proposed standard for robot exclusion® which prevents 
the robot from accessing places where, for various rea- 
sons, it is not welcome. 

The indexer robot and the WWW robot is written 
in the C language. All index files are implemented us- 
ing the GNU GDBM Database Manager library pack- 
age [9]. 

2.2 Search Engine 

The user interface to the search engine is a HTML 
form which can be invoked by standard WWW clients 
such as Mosaic and Netscape. The user types in the 
keywords and clicks on a submit button to send the 
query to the search engine. 

Upon receivin g a query, the search engine executes 
on e oi CEe^ranking algoru ^nifi np iwir^nw-J^^ikknco 

' ^ser can access the phyaj ra! P'^g^ Since the iiLdex 
database r^ntaing nil nf Xhr i nfr i rm'^ ii rn rii l l l' f' IH fr>T 
r ahkm^, the ranking proc^ ** ^'[^ffl ^•'"^ n'^'^'^ 
to any WWW pages physically. If desired, the user 



'Using the If- Modified' Since request header field. 
' {http:/ /info. webcrawler.com/mak/projects/robotB/norobota. 
html). 
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can specify the maximum number of URL's to return 
and the ranking algorithm to use, instead of using the 
default setting. We discuss the ranking algorithms in 
detail in the next section. 

It is clear that it is infeasible to search the WWW 
pages directly to compute the relevance scores with- 
out the help of the index. However, maintaining the 
currency of the index is a problem. We consider this 
a necessary tradeoff between speed and timeliness of 
the results. An immediate solution is to increase the 
frequency of index rebuild (notice that only the part of 
the index updated needs to be rebuilt). We are inves- 
tigating within the D-WISE project efficient ways of 
organizing the index to facilitate detection of updates 
and reorganization in a WWW server. 

The search engine and its gateway mechanism run 
as CGI scripts (external programs executable by a 
WWW server on behalf of WWW cliente using a stan- 
dard mechanism called Common Gateway Interface). 
These scripts are written in C code. Our current server 
is running under NCSA HTTPD version 1.3 WWW 
server. 



3 Ranking Algorithms 

In this paper, we explore four ranking algorithms, 
namely, (n Boolean Spreading Activation, (2) Most- 
cited, (3) TFxIDF, and (4) Vector Spreading Activa- 
tion. The first two algorithms rely on WWW meta- 
information, namely, the hyperlink structure, for rank- 
ing the WWW pages without considering term fre- 
quencies. The TFxIDF method is based on word oc- 
currence statistics [12], whereas the Vector Spreading 
Activation method makes use of both word occurrence 
statistics and the hyperlink structure in ranking'. 

3*1 Terminology and Notations 

In the WWW environmenti a document is com- 
monly referred to as a WWW page. In this paper, we 
use the term page and documeni interchangeably. The 
following notations are used in the rest of this paper. 

M the number of query words. 

Qj the j-th query word, for (I < j < 

M). 

N : the number of WWW pages in the 
index database. 

Pi : the i-th WWW page or ite ID num- 
ber for l<i<N. 

Ri^^ the relevance score of Pi with respect 

to query q. 

Lit,fc the occurrence of an incoming hyper- 
link from Pk to P,, where Lu^t - 
1 if such a hyperlink exists, or 0 
otherwise. 

Loi^k the occurrence of an outgoing hyper- 
link from Pi to Pfc, where Lo<,jk = 
1 if such a hyperlink exists, or 0 
otherwise. 

Cij occurrence of Qy in P<, where C<j = 

1 if Pi contains Qj, or 0 otherwise. 



3.2 Description of the Algorithms 
3.2.1 Boolean Spreading Activation 

This algorithm is based on the Boolean retrieval 
model, where retrieval is based solely on the occur- 
rence or absence of keywords in the documents. We 
extend the Boolean model so that documents can be 
ranked based on the number of query words they con- 
tain. This strategy can be considered as a simple fuzzy 
set retrieval model in contrast to the rigorous set mem- 
bership of the Boolean retrieval model. More formally, 
document t is assigned a relevance score, A,*,, , with re- 
spect to query q as follows. 



ft..=f:c.j (1) 

Notice that term frequency is not used in the formula. 
It is assumed that the query does not contain any dis- 
junctions nor negations. Disjunctions can be removed 
by normalization or splitting the query into separate 
conjunctive clauses. Negations can be removed by dis- 
qualifying all documents containing the negated terms 
prior to the ranking. 

The Boolean Spreading Activation algorithm ex- 
tends this strategy by propagating the occurrence of 
a query word in a document to its neighboring docu- 
ments. This is possible in the WW\y environment 
because a document can have hyperUnk(s) to/from 
one or more other document(8), forming a network 
of documents. We assume that if two documents are 
linked to one another there must be some semantic re- 
lationj^s) between the two. In other words, document 
Pi which does not contain query word Qj but is linked 
to another document Pk containing Qj is treated as if 
it contains Qj. However, we assign Pi with a smaller 
score than if it actually contained Qi. For each WWW 
page Pi, the algorithm assigns a relevance score with 
respect to query q as follows. 



Ri.9^f^JiJ (2) 



where /<j is defined as: 

!ci ifC<j = l 
C2 if there exists k such that 
Ck j = 1 and Lii,k + Loi^k > 0 
0 otherwise 

ci and C2 are constants (ci,C2 > 0) where Ci > C2. 
The algorithm is not sensitive to the values of these 
two constants. We prove this point empirically in the 
next section. In the implementation, we use ci = 10 
and ca = 1. 
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3.2.2 Most-cited 

Aa with Boolean Spreading Activation, this algorithm 
takes advantage of information about hyperlinks be- 
tween WWW pages. Each page is assigned a relevance 
score which is the sum of the number of query words 
contained in other pages citing, or having a hyperlink 
referring to, the page. More formally, the relevance 
score of page Pi with respect to query q is defined as: 

ft..= E (^KtE^kj) (3) 



In the next section, we show empirically that this 
simplified method works better in the WWW environ- 
ment than the vector space model with vector-length 
normalization. 

3.3 Vector Spreading Activation 

This aleorithm combines the vector space model 
and spreading activation model. In this algorithm, 
each document is first assigned a relevance score us- 
ing TFxIDF algorithm, then the score of a document 
is propagated to the documents it references. More 
formally, the algorithm assigns a relevance score to 
page Pi with respect to query q as follows. 



The objective of this algorithm is to assign, among 
potentially relevant documents, larger scores to the 
referenced documents than to the referencing docu- 
ments. 



3.2.3 TFxIDF 

The TFxIDF algorithm is based on the well known 
vector space model [121, which typically uses the cosine 
of the an^le between tne document and query vectors 
in a multi-dimensional space as the similarity measure. 
As described in [13], vector-length normalization can 
be applied when computing the relevance score, Ri^q^ 
of page Pi with respect to query q: 



where: 

TFij the term frequency of Qj in Pi 

TFi 

.max the maximum term frequency of a 
keyword in Pi 

IDFj : log(W/Er=iC.j) 
Generally speaking, the relevance score of a document 
is the sum of the weights of the query terms that ap- 
pear in the document, normalized by the Euclidean 
vector length of the document. The weight of a term 
is a function of the word's occurrence frequency (cdso 
called the term frequency) in the document and the 
number of documents containing the word in the col- 
lection (i.e., the inverse document frequency). This 
weighting function gives higher weights to terms which 
occur frequently in a small set of the documents. 

The full vector space model is very expensive to 
implement, because the normalization factor is very 
expensive to compute. In our TFxIDF algorithm, the 
normalization factor is not used. That is, the relevance 
score is computed by: 

E (0.5-h0.5=Tr^)/i?F,- (5) 



N 

^•.9 = 5m+ 5Z ^^'U -Sy., (6) 

where 5i,, is the TFxIDF score of Pi as defined in 
equation 5. a (0 < a < 1) is a constant link weight. 
Through an experiment (discussed in the next sec- 
tion), we found that 0.2 is the optimal value of a. 

4 Retrieval Effectiveness 

We evaluated the four algorithms on an index 
database covering pages at the Chinese University of 
Hong Kong (CUHK.HK domain). CUHK site was cho- 
sen because of its reasonable size arid it has a diverse 
collection of information provided by the university's 
various departments, from the fields of humanity to 
engineering. We froze the entire WWW collection by 
copying all of the WWW pages from the site to a local 
disk. This was done on April 26, 1995. We recorded 
2393 WWW pages including 1139 non-HTML pages 
(non-inline image, sound, video, click-able map, CGI 
script, and other text files). We then built an index 
from the full-text collection as described before. 

56 test queries^ were used. The test queries were 
generated as follows. First, we selected at random 
100 WWW pages from the collection. Of these 100 
pages, we removed pages of directory type (indices, 
catalogs, hotlists, tables of contents, etc.) and non- 
HTML pages, resulting in 56 pages. Next, for each of 
these 56 pases, we manually extracted keywords from 
it, selected Keywords which can be used to construct 
a phrase representing the central concept (topic) of 
the page, and constructed a query from that phrase. 
Boolean OR operators, along with scope markers, were 
used in the queries to specify synonyms. 

Given a query, the judgment on whether a WWW 
page is relevant or not is somewhat ambiguous, as it 
is very subjective and may vary across users. For this 
evaluation, we define relevance in the context of re- 
source discovery, i.e., a WWW page is considered rel- 
evant to a query if, by accessing the page, the user can 
find a resource address (URL) containing information 
pertinent to the query, or if the page itself is such a 



^The tut queries are listed in the Appendix of a report ac- 
cessible at (http://dbx.c8.uat.hk:8000/doc/wwwindex.pe). 
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Figure 3: Recall- precision curve for each of the four 
search algorithms obtained by averaging the curves 
over the 56 test queries. 



resource. We manually examined the entire collection 
to identify the relevant WWW pages for each query. 

We used the standard evaluation procedure [12] to 
compute the average interpolated recall/precision for 
each of the algorithms. The recall /precision curves are 
shown in figure 3. The high average precision obtained 
in this experiment is attributed to the query construc- 
tion procedure which guarantees that each query has 
at least one highly relevant document. Figure 3 shows 
that Vector Spreading Activation has the best retrieval 
performance, followed by TFxIDF, Boolean Spreading 
Activation and Most-cited. 

As mentioned in the previous section, we also con- 
ducted an experiment using the same 56 test queries to 
see whether vector-length normalization (see equation 
4) could improve the retrieval effectiveness of TFx- 
IDF. Fieure 4 shows the recall-precision of TFxIDF 
with and without such a normalization. According to 
Salton and Buckley [13], vector-length normalization 
typically does not work well for short documents. This 
is consistent with our result since the average page 
length in the test collection is only 48.11 words (not 
including stop words and other words removed during 
the indexing process). It may be the case that vector- 
length normalization, in general, does not work for 
documents where the size of a segment with a coher- 
ent topic is small, e.g., a few sentences. In a WWW 
environment, it is common that a topic is only rep- 
resented in a page by a hypertext anchor (clidc-able 
phrase). 




Figure 4: Recall-precision curve for TFxIDF with and 
without vector-length normalization obtained by av- 
eraging the curves over the 56 test queries. 



To evaluate the sensitivity of Boolean Spread- 
ing Activation and Vector Spreading Activation algo- 
rithms to the choice of parameter values used, we con- 
ducted performance evaluation similar to the above on 
a range of parameter values. 



Figure 5 shows the recall-precision curves of 
Boolean Spreading Activation on the 56 test queries 
using a numberof ci and value combinations (see 
equation 2). By setting C2 equals 0, that is by dis- 
abling the spreading activation effect, we obtained 
the retrieval effectiveness of the fuzzy set retrieval 
model (see equation 1). Enabling the spreading ac- 
tivation effect by setting 0 < cj < ci improved the 
algorithm's recall. The algorithm produced the samie 
recall- precision curve for (ci , cj) combinations of (2,1), 
(5,1), (10,1) and (10,5). Poor retrieval effectiveness re- 
sulted when Ci was set equal to C2, in which case many 
pages with various degrees of actual relevance had the 
same relevance score. 

Figure 6 shows the recall-precision curves of Vector 
Spreading Activation on the 56 test queries with a par 
rameter value of 0.0 (TFxIDF without the spreading 
activation effect), 0.1, 0.2, Q.3, 0.4 and 0.6 (see equa- 
tion 6). The best retrieval performance was achieved 
with or equals 0.2. 
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Figurc 5: Recall-precision curves of Boolean Spread- 
ing Activation algorithm on the 56 test queries with 
(ci, cj) value pairs of (1,0), (2,1) and (1.1). 



a 




Figure 6: Recall-precision curves of Vector Spreading 
Activation algorithm on the 56 test queries with a 
value of 0.0, 0.1, 0.2, 0.3, 0.4 and 0.5. 



5 Discussion and Future Work 
5.1 Conclusion 

The results of the experiment in Section 4 provide 
us with some hints on the nature of WWW informa- 
tion retrieval environment. The relatively superior re- 
trieval effectiveness of TFxIDF and Vector Spreading 
Activation search algorithms shows that the concen- 
tration or distribution of words in a WWW page and 
across WWW pages is a good indicator of the page's 
contents or portions thereof. Algorithms which rely 
on meta-information such as hyperlinks information, 
while intuitive, did not perform as well. This shows 
that the interconnectivity between WWW pages is not 
a reliable indicator of semantic relationships between 
the contents of the linked pages. The poor overall 
performances of these ranking algorithms may also be 
attributed to the fact that many WWW pages con- 
tain many different and unrelated topics in a single 
page. This is true for many home pages, index pages, 
what's-new pages, hotlist, directory and catalog pages 
which occur frequently in the WWW environment. It 
is worth noting that, since most of the test queries 
are phrases taken out from actual pages with some 
synonyms added, all of the algorithms showed good 
recalls. 

5.2 Other Index Servers 

There are many robot-based WWW index and 
search services on the Internet today. ^® However, 
among the well known robots, only a few employ full- 
text indexing, e.g., WebCrawler" {11], the Repository 
Based Software Engineering Project Spider^' (RBSE) 
[5], and Lycos. Other services index only page ti- 
tles and first level headings (e.g., JumpStation^'*^ or 
titles, headings and anchor hypertexts (e.g., World 
Wide Web Worm or WWWW^'). Our indexer robot 
takes other HTML tokens such as words in bold-face 
or italics, the first sentence of every list item, in addi- 
tion to titles, all-level headings and anchor hypertexts. 
Our scheme is a balance between full-text and title- 
only schemes by taking advantage of HTML meta- 
information as much as possible. On a WWW page 
containing mostly lists such as an index page, our 
scheme extracts nearly as much words as a full-text 
scheme. 

Not many index servers use sophisticated infor- 
mation retrieval models beyond a simple Boolean 
model or pattern matching based on Unix egrtp (e.g., 
WW WW), with exception of the RBSE, the We- 
bCrawler, and Lycos. The RBSE uses WAIS search 



JO A list of WWW robots can be found at 
(http :// info , webcrawlcr .com/mak/ proj cc t o/robo ts/acti vc .html) . 

"(http://wcbcrawler.C5.washmgton.cdu/WebCrawlcr/Home. 
html). 

* ^ (http://rbse .jsc.nasa,gov/cichmann/urlsearch.html) . 
J^ (http://lycos.ct.anu.edu/). 

(http://www.stir.ac.uk/jsbin/j8). 
" (http://www.cs.colorado.edu/iiome/mcbryan/WWWW. 
html).. 
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engine which ranks WWW pages based on the occur- 
rence frequencies or terra frequency (TF) of the query 
words (lOj. The WebCrawler and Lycos, as with our 
index server, rank the pages based on term frequency 
and inverse document frequency (TFxIDF). As of this 
writing, we are not aware of any attempt to quantita- 
tively measure the retrieval effectiveness of search al- 
gorithms in the WWW environment as in the present 
work. 

6.3 Future Work 

As part of an on-going research project, our next 
step in developing the WWW index server is to study 
the effectiveness of other information retrieval tech- 
niques, including relevance feedback and sophisticated 
user interface techniques. We are also working on de- 
veloping methods based word distribution statistics, 
which has been proven useful in this paper, for au- 
tomatic catalog generation, index database compres- 
sion/summarization and multi-server indexing. 
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ABSTRACT 
Query specificatiOD for content-based image retrieval 
is typically accomplished through query-by-example 
paradigms, such as query-by-image and query*by- 
sketch. In some cases query-by*sketch can be diffi- 
cult —lack of sketching abiUtieSi difficulty to detect 
distinguishing image features — and querying is there- 
fore performed through the query-by-image paradigm. 
However, this paradigm offcen fails since a single sam- 
ple image rarely includes oil and only the characterizing 
elements the user is looking for. 

This paper prsents a system that supports query-by- 
image using multiple image exampleSi both positive and 
negative. The system also enables editing of examples 
so as to disregard those image features that are not 
relevant to the query. 

1. INTRODUCTION 

Content Baaed looiage Retrieval (CBIR) is an exten- 
sion to traditional information retrieval that supports 
querying of images through a visual specification, thus 
addressing perceptual content of visual information [1]. 
TVpically, queries are performed through the query-by- 
eacample paradigm which requires the user to provide a 
representation of the viauol concept characterizing the 
searched image. 

In modem CBIR systems, querying-by-example 
mainly takes place in two different forms: query-by- 
image and query-by-sketch. Query-by-image allows the 
user to take a sample image — dther from a sample im- 
age set or from the answer to a previous query — and 
use it as a prototype to retrieve images with similar 
content. This paradigm supports retrieval by global 
Image similarity addressing global color/texture dis- 
tribution, and the overall image structure. Query-by- 
sketch allows the user to sketch contours of salient re- 
gions on a white-board. Regions can be manually au- 
thored or traced from an image, following the objects* 
contours. Once they have been sketched, regions can 



be characterized by color, texture, shape, position, and 
area. This paradigm supports retrieval based on local 
properties of images. 

Although it b widely employed in current systems 
for content-based retrieval, query-by-sketch has several 
drawbacks. The nudn drawback being that users find 
it difficult to create by scratch suitable images em- 
bodying the visual concept to be searched for. This 
is usually due to the lack of visual memory of most 
users, lack of sketching abilities, and difficulty of de> 
tecting those distinguishing salient features that actu- 
ally characterize that concept. Due to this, examples 
are built — in most cases — according to the query-by- 
image paradigm. Existing images, either retrieved with 
a previous query or selected from a random subset of 
the database, are used as examples. A limiting fac- 
tor, in this case, is that a single sample image rarely 
includes all and only the characterizing elements of a 
visual concept. On the one hand, a sole example typ- 
ically includes a subset of the characterizing features. 
On the other hand, some features included in the ex- 
ample may not contribute to the definition of the vi- 
sual concept. The former issue can be addressed by 
querying through multiple sample images. The latter 
requires that content editing of each example be sup- 
ported by the system. 

FHirthermore, specification of a visual concept can 
be achieved by eubmitting several examples, providing 
positive as well as negative instances of that concept. 
Usage of positive and negative examples has been ex- 
pbited for interactive learning of image classification 
((6], [7], [9]). Examples are used to associate low-level 
features with high*level concepts to be used at query 
time. 

Positive and negative examples have also been used 
to support relevance-feedback interaction ([4], (5], (10)) 
which allows the uiser to score the relevance of retrieved 
images. Relevance scores are used to change the system 
similarity measure so as to converge to a result that 
matches user's expectations. This prevents the use of 



O-78O3-6536-4A)Q/SlO.0O (c) 2000 IEEE 



333 



any indexing structure that develops on a predefined 
similarity measure. 

In this paper a system is presented that supports 
efficient querying by Positive and Negative examples. 
The system features a color-based retrieval engine. His- 
tograms encode color content and an M-tree structure 
is used for indexing. Querying is performed through a 
visual interfiu^e which lets the user specify positive and 
negative examples. 

The paper is organized as follows: Section 2 pro- 
vides insist on content representation through lus- 
tograms, and related properties; then, Section 3 ex- 
pounds our solution for retrieval by positive and nega- 
tive examples; Finally, in section 4 experimental results 
are reported and conclusions axe drawn. 

2. CONTENT RBPRBSBNTATION 
THROUGH COLOR HISTOGRAMS 

A generic histogram H with n bins is an element of the 
histogram space C m". 

Given an image and a discretization of a feature 
space, histogram bins count the number of occurences 
of points of that feature space in the image. Histograms 
provide a synthetic representation for content, and have 
been used for different features, such as color and shape 
((2J ("))• 

Histograms also support a multi-resolution descrip- 
tion of image features. Given a partitioning of an image 
into n fine-gr^ed regions, histograms provide a repre- 
sentation for the content of each of these regions. The 
representation of wider re^ons, at a less fine-grained 
level can be computed by merging the histograms of 
each region Tl{ at level t that contributes to the region 
at level % + 1 (7^*+^ = liJLjTlJ), Each i-th bin 
h*^^ at level t + 1 is computed as follows: 

It is also possible to provide lower resolution descrip- 
tions based on a coarser discretization of the feature 
space, with dimension n < n. 

In order to compute the similarity between two his- 
tograms, a norm must be defined in the histogram 
space. This is accomplished through the introduc- 
tion of a positive definite, symmetric distance matrix 
A € TB^^^ that allows the distance between two his- 
tograms H and H' to be computed as: 

= (2) 



being hi [h[) the i-th element of H {H'), 

This expression evidences that elements Oij weight 
the extent to which the difference between the t-th and 
y-th bins contributes toV.U the distance matrix A is 
a diagonal one, only differences between corresponding 
bins contribute to the computation of V. However, if A 
isn't a diagonal matrix, elements Oij (t ^ ;) can be used 
to model the cross-distance between non-corresponding 
elements. TUs is particularly useful in order to partly 
recover from the loss of information associated with 
the discrete nature of content representation through 
histograms. 

For the implementation of the system presented in 
this paper, 39 reference colors were selected to dis- 
cretize the color space, and histograms with 39 bins 
are used to encode color information for each of the 
tiles images are partitioned into. Histograms are nor- 
malized with respect to the image size so as to provide 
scale Invariance of the representation. 

A distance matrix Ac has been defined for the com- 
putation of the distance between color histograms and 
structure histograms, respectively. The color distance 
matrix Ac is a 39 x 39 symmetric matrix whose el- 
ements encode the perceptual amilarity between 
t-th and i-th reference color. 

3. COMBINING MULTn>LE EXAMPLES 
INTO A SINGLE QUERY 

The basic idea underlying our approach for querying 
by positive and negative examples is illustrated in Fig- 
ure 1. 

Let be the set of positive sample histograms 

and {iV^' l^i be the set of negative sample histograms. 
We want to represent the query requirements expressed 
through the positive and negative samples ushig a sin- 
gle composite query histogram H, The composite 
query histogram H must be close to all the elements 
of {P*}?ai and far from all the elements of {N')51i. 
Hence H can be found as the histogram which mini- 
mizes the following functional: 

= ^ViH,P') - f^V(H,N^) (3) 

being P(-t*) the histogram distance function. That is: 
V{H,P') = ^'£,{Hr-PiKH.-P:)ars (4) 

r « 
r • 

Since the solution to Eq.(3) must be a regular his- 
togram, Eq.(3) must be minimized subject to two dis- 
tinct constraints. These are related to the norm of the 
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Figure 1: A number of examples, both positive and 
negative, is used to describe the visual concept. The 
system combines histograms of the examples to derive 
a single composite query histogram representing the vi- 
sual concept. An image embodying the visual concept 
is also shown. 



solution (the histogram must be a normalized vector) 
and to the positiveness of individual histogram compo- 
nents. 

Thus, the composite query histogram H = 
{Hi,,, . is the solution of an optimization prob- 
lem with bilateral and unilateral constraints: 



min^ 

H 
i=l 

[ Hi>Q 



(6) 



Vt = l r 



The resulting histogram summarizes the positive and 
negative examples, and could be thought of as the his- 
togram of a query image featuring the visual concept 
that is the object of the user*s query. 



4. A PROTOTYPE IMPLEMENTATION 

The user interface is shown in Figure 2. The interface 
is composed of two main parts. The lower part in- 
cludes the output image viewer. This displays images 
that are either randomly selected (obtained through 
the Randomize button) or retrieved as the result of a 
query session. The upper part of the interface includes 
two panels on the left and on the right — used to collect 
positive and negative examples, respectively — , and one 
editing area in the middle. 

Images can be selected from the output viewer and 
added either to the positive or negative sample sets 
(through the Add Positive and Add Negative buttons, 
respectively). 

A simple query with one positive sample image is 
shown in Figure 2. Retrieved images are displayed in 
the lower part of the interface. Similarly to the query 
image, all retrieved images feature bright and saturated 
red, cyan and green colors. 

In Figure 3 a query refinement is shown. One of the 
retrieved miages (3'"'* row, 4"* column in Figure 2) is 
added to the negative sample set. The regions of this 
image that feature white and blue patterns are labeled 
as not relevant, and only regions with a green pattern 
are retained. The content of the posiUve sample image 
is edited as well. Image regions featuring a green pat- 
tern are labeled as not relevant. Retrieved images are 
shown in the lower part of the interface: All the images 
feature bright red and/or cyan patterns. No image fea- 
tures relevant green patterns. It can be noticed that 
the image used as positive example is not retrieved in 
the first position (it is ranked in the S**"^ position). This 
is indeed consistent with the query that includes a neg- 
ative example with a large green region. As a result, 
the ranking score of the image used as positive example 
— ^that features a green region on the right — is penal- 
ized. 
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ABSTRACT 

On-line coUectiona of images axe growing larger and 
more common, and tools are needed to efficiently man- 
age, organize, and navigate through them. We have 
developed a prototype system called QDIC which al> 
lows complex multi-object and multi-feature queries of 
large image databases. The queries are based on image 
content - the colors, textures, shapes, and positions of 
images and the objects/regions they contain* The sys- 
tem computes numeric features to represent the image 
properties and uses similarity measures based on these 
features for image retrieval. The focus of this paper is 
its user interface which allows a user to gr^hically pose 
and refine queries based on multiple visual properties 
of images and their objects. 

1. INTRODUCTION 

Today's hardware technology allows us to acquire, 
store, manipulate, and transmit large numbers of dig- 
ital images, and to generate large on-line image col- 
lections. Current systems for image management, re- 
trieval, and database functions tely ahnost exclusively 
on keywords or text associated with each image for 
their access method, lb complement the text-based 
methods, we are studying content-based ^proaches 
that: (1) are based on computed features such as coIoTi 
texture, shfl4;>e, and position of images and the objects 
they contain; (2) use similarity measures and retrieve 
images ^ike" or similar to a given image (returning 
them in ranked order); (3) allow complex queries in 
the sense that query predicates can specify multiple 
features of multiple objects in a scene (^'a red round 
object on the left and large blue textured region on the 
right*); and (4) handle queries that are posed visually 
and graphically, such as by painting an approximate 
query image or picking a sample texture frcmi a palette 
of textures. The emphasis of this paper ia the graphical 
user interface we have developed that allows users to 
pose these queries and the facilities it provides to con- 
struct ^ulti-queries" involving global images features 



as well as multiple features of multiple objects. 

Querying image databases is an active area of re- 
search. A survey of systems with query-by-content car 
pabilities is g^ven in [1]. QBIC has been presented in 
[2, 3, 4, 5, 6], and other recent systems are described 
in [7, 8]. Previous graphical user interface, methods 
include placing icons that represent semantic objects 
such as house and car in a query area [9] and its exten- 
sion to 3D using a manipulating glove to place icons in 
space to represent an image to be retrieved [10]. 

2. QBIC QUERIES 

Hie QBIC system distinguishes between '^scenes^ (or 
images) and "objects^. A scene is a full color image 
or rin^e frame of video and an object is a part of a 
scene, e.g., a person in a beach scene. Each scene has 
sero or more objects. Objects are identified manually 
or semi-automaticalty. QBIC computes the following 
features: 



Objecto 

1. average color 

2. histogram color 

3. texture 

4. shape 

5. location 



Images 

1. average color 

2. histogram color 

3. texture 

4. posiUonal edges (sketch) 

5. positional color (draw) 



Further descriptions of these features axe given 
in [2]. The features, precomputed and stored in a 
database for subsequent queries, are vectors of numeric 
values. For example, the average color feature is a 3 
element vector and tiie shape feature is a 20 element 
vector. 

Once the features for objects and images have been 
computed, queries may then be executed. Queries may 
be object-based, in which a user requests images con- 
taining objects with certain features, scene-based, in 
which full scenes with certain features are retrieved, or 
a combination of both. For example, an object-based 
query might search for images, containing square red 
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objects. A scene-based query might search for images 
(scenes) having a certain percentage of the colors red 
and blue. Query results are based on the similarity of 

database items to the query items (either objects or 
scenes) and use similarity functions for each feature. 
These similarity functions are normalized so that they 
can be meaningfully combined. Most of the similarity 
functions are based on the weighted Euclidean distance 
in the corresponding feature space (e.g., three dimen- 
sional average color), although some, for example, color 
histograms, have their own similary function. See [2] 
for details. 

3. USER INTERFACE 

A query specification can include color, texture, shape, 
location, and/or text features of a single object or scene 
or. in the case of "multi-queries" , of scene features to- 
gether with multiple objects, each of which have mul- 
tiple features. 

The user interface to support such queries has two 
main parts: the Query Specification windows to pose 
the queries and the Query Results window to display 
the results. 

3.1. The Query Specification Windows 

The Query Specification Windows form a hierarchy of 
three levels. The top level is the symbolic representa- 
tion of the query, the middle controls the enabling and 
weighting of feature values, and the lowest allows the 
feature values to be set or selected. 

Level I: Tht Query Window. An example of the 
top level, or Query Window, is shown in Figure 1. The 
window has a menu bar and a query area. The user can 
specify an object or a scene component of a query by 
selecting appropriately from the Create option on the 
menu bar, which in turn, places an "object" or "scene" 
icon into the query area. An icon with a rectangle rep- 
resents a scene and one without a rectangle represents 
an object. Multi-queries are formed by creating mul- 
tiple icons in the query area. By clicking on the icon, 
the user can get a menu which contains the Copy (repli- 
cate the icon and all of its features), Delete, Move, or 
Edit options. This last option causes the appearance 
of the corresponding Object or Scene Feature Windows 
in Figures 2 and 3, which form the middle level of the 
query hierarchy. 

Level 2: The Feature Windows. These windows 
contain all the features available for object or scene 
queries. Each available feature has an enable but- 
ton, a feature type field, editor /sampler radio buttons, 
a value/picker button, and a weight slider. The en- 
able button signifies that the feature is to be used in 




Figure I: The query window. 



queries. The weight of the feature is determined by 
a slider which goes from 0 to 1 allowing the user to 
adjust the relative importance of the feature during 
queries. Each feature can be set either by using an edi- 
tor or by selecting a pre-defined value from a "sampler" . 
The editor/sampler radio buttons determine whether 
to bring up an editor or sampler whenever the asso- 
ciated value/picker button is pushed. When a feature 
value of an object or scene is set, the value/picker but- 
ton is modified to represent that value. For example, 
when a color is selected, it is painted on the color value 
button; when a shape is drawn, it is reproduced on the 
button, etc. Notice in Figure 2 that the buttons match 
the properties sel'ected in Figures 4, 5, 6, and 7. 

Level 3: The Editors and Samplers. To set the fea- 
ture values, the user clicks on the value buttons in the 
Feature Windows. Depending on which of the EMitor 
or Sampler radio buttons is enabled, the user gets ei- 
ther an Editor or a Sampler. EMitors allow any values 
to be set/drawn and Samplers allow one value from a 
given set to be chosen. The editor for color, Figure 
4, uses a standard set of (R,C, B) sliders and a color 
patch area which shows the current selected color. The 
histogram color editor, Figure 5, allows multiple col- 
ors and their amounts to be selected. The palette on 
the lefi of the picker displays the set of selectable col- 
ors. Once a color is selected, it will appear in a box on 
the right and the relative amount of the color can be 
specified by the sliders. 

Shape, sketch, and draw use the same editor, Figure 
6, which incorporates a palette of colors on the left and 
a drawing area on the right. In between, the currently 
selected color is shown, together with a row of buttons: 
Polygon, Ellipse, Rectangle, and Line, for the drawing 
mode desired. The user clicks on a color in the palette 
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Figure 2: The object feature window lets the user en- 
able, disable, and weight object features and call up 
the associated pickers. 
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Figure 3: The scene feature window lets the user en- 
able, disable, and weight scene features and call up the 
associated pickers. 



lo choose a color, selects a drawing mode, and draws 
in the drawing area. An erase mode is also available. 

The location editor, Figure 7. is one way the user 
can specify an (x.y) location of an object in a query. 
(The other is by positioning the icon within the query 
window.) The picker is a simple blank area with an **X" 
marking the spot where the object should be located. 
Lastly, the text editor is a simple window for entering 
text. 

A Sampler displays a set of samples from which 
one can be chosen. An example is the texture sam- 
pler shown in Figure 8. Multiple samplers for a given 
feature are possible and are arrange hierarchically, 
thereby, allowing, for example, texture samplers ar- 
ranged as ALL-TEXTURES containing the two subsets 
NATURAL and SYNTHETIC. 




Figure 4: The RGB color picker. 
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Figure 5: The multicolor color picker. 

Once the features of a scene or object are set. its 
icon in the query area will display symbols representing 
each of the features enabled for querying as shown in 
Figure 1. The color feature is symbolized by filling the 
icon with the selected color. A small RGB histogram 



78 



I. r 










V ; - .• • •: i 











Figure 6: The freehand drawing window for specifying 
an object shape or scene sketch. 




Figure 7: The location picker. 




Figure 8: A texture Sampler. 



on an icon represents the histogram color feature. The 
symbol for the texture feature is a hash pattern. The 
text feature is represented by the letter The loca- 
tion feature in an object is symbolised by four arrows 

at each corner of the icon. The direction of the ar- 
rows indicates whether the location of the icon in the 
query area (arrows pointing inward) or the location in 
the location picker (arrows pointing outward) is used. 
The shape feature of an object is shown on an object 
icon as a circle. In a scene icon, the sketch feature 
is represented by two stick figures at each end of the 
rectangle. 

Mutli-queries can easily be posed using this inter- 
face. For example, one can imagine a query with a red 
circular object, a blue object in the center of an image, 
and a green area at the lower portion of the scene. The 
query area would have three icons with the appropri- 
ate symbols representing the features of the two objects 
and the scene. By editing the objects or scene, their 
features for the query are shown in greater detail such 
as the actual circular shape or the actual portion which 
is green. 

3.2. The Query Results Window 

After a query has been specified, it is run by selecting 
Query on the menu bar of the Query Window and then 
Execute. The similarity measures, one for each feature 
of each item in the query, are combined into a final sim- 
ilarity measure. The combined results are obtained by 
normalizing all individual similarity measures by their 
variance and combining them in a weighted sum. 

The ordered results of a query are displayed in the 
Query Results Window from best match to nth best 
match (n is user-settable and we usually display the 
top 20). Each image returned is displayed as a re- 
duced "thumbnail''. This thumbnail is active and can 
be clicked on to give a menu of options. The options 
include queries of the form "Find images like this one" , 
display the similarity value of this image to the query 
image, display the (larger) full scale image, place the 
image in a holding area for later processing, perform a 
user defined image operation or comparison, and so on. 

The Query Results Window is divided into two ar- 
eas, the Results Area and the Holding Area. Each time 
a query is executed, the retrieved image thumbnails are 
displayed in the Results Area, overwriting any previ- 
ous results. The user can move any thumbnail or set 
of thumbnails to the Holding Area. This process saves 
them so that they can be compared with later query 
results, used as inputs to subsequent queries, etc. Any 
menu operations available on thumbnails in the Results 
Area (display, find images like this one. etc.) can also 
hp applied to thumbnails in the Holding Arpa. 
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Results from a sample query arc shown in Figure 9. 
The same results, clipped to the bounding box of the 
area containing the queried objects, are displayed in 
Figure 10. The query was a multi-query for images 
with two objects. The first object was specified using 
the color picker to select the color red and the shape 
picker to draw a round shape. Similarly, the second 
object was defined to have a green color and round 
shape. 




Figure 9: Results from a query. 



Figure 10: Results from the same query as above. In 
this view, the results are **clipped" to the bounding box 
of the image containing the objects that participated 
in the query. 



5. CONCLUSION AND FUTURE WORK 

Databases containing large on-line image collections 
are becoming common and tools are needed to manage, 
organize, and retrieve images from these databases. A 
user interface allowing complex queries and query re- 
finement is an important component of a query/browse 
tool. The QBIC system provides such an interface and 
lets the user pose complex visual queries based on both 
global scene properties and properties of multiple ob- 
jects within the scenes. Qbic was implemented in C 
and Xn/Motif on an IBM Risc/6000. 

Areas of future work include combining QBIC with 
more powerful text search methods, adding logical op- 
erations such as AND*s, OR s, and NOT s to the spec- 
ification of a query, developing new features and their 
corresponding similarity matching algorithms, and pre- 
senting all this functionality through a well-designed 
and powerful n.«^r intcrfac*?. 
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Abstract 

HyperSQL is an interoperability layer that enables 
database administrators to rapidly construct browser- 
based query interfaces to remote Sybase databases. 
Current browsers (Le., Netscape, Mosaic, Internet 
Explorer) do not easily interoperate with databases 
without extensive "CGI" (Common Gateway Interface) 
programming, HyperSQL can be used to create forms- 
and hypertext-based database interfaces for non computer 
experts (e.g., scientists, business users). Such interfaces 
permit the user to query databases by filling out query 
forms selected from menus. No knowledge of SQL is 
required because the interface automatically composes 
SQL from user input Database results are automatically 
formatted as graphics and hypertext, including clickable 
links which can issue atlditional queries for browsing 
through related data, bring up other Web pages, or access 
remote search engines. 

Query interfaces are constructed by inserting a small 
set of HyperSQL descriptors and HTML formatting into 
text files. No compilation is necessary because commands 
are interpreted and carried out by our special gateway, 
positioned between the remote databases and the Web 
browser. Feedback from developers who have used the 
irutial release of HyperSQL has been encouraging. At 
present, query interfaces have been successfully 
implemented for three major NSF-sponsored biological 
databases: Microbial Germplasm Database, Mycological 
Types Collection, and Vascular Plants Types Collection 



1. Motivation 

Accessing infonnation stored in biological databases 
helps speed research and is essendal to solve complex 
problems requiring data from multiple sources. Such 
databases are not likely to be centrally located, because 
they are typically maintained under the control of 



individual scientists/coilectors who know the limitations of 
the data. 

Many large-scale biological collections are stored in 
Sybase databases (e.g.. Genome Database, Microbial 
Geniq)lasm Database. National Fungus Collection, Nature 
Conservancy Database). Sybase is a commercial 
relational database management system that employs SQL 
(Structured Query Language). SQL was adopted as the 
standard query language in the scientific community 
because it is an industry standard, is well-defined and has 
a solid theoretical foundation in set theory, and is 
implemented in successful and proven commercial 
products (Le., Sybase, Oracle, Informix) that run on high- 
end UNDC workstations equable of managing large 
volumes of data. 

Researchers in the biological sciences have not had an 
easy time retrieving information stored in SQL databases. 
There are four primary reasons for this difiBcuIty^ Hrst, 
SQL requires that users remember too much infonnation 
about database organization. Before users can formulate 
queries, they must remember or look up the names of 
tables ond data items, vAiich requires knowledge of the 
database's proprietary command language. Moreover, 
users must determine die logical meanings of the data 
items, which can be frustrating because of cryptic naming 
conventions and the use of synonyms. Often there is little 
more available than an uncommented listing of the 
database schema. These problems are exacerbated when 
tables in the database contain a large number of . data 
categories. 

Second, users are forced to use cryptic conunand-line 
tools (i.e., Sybase isql) intended for database 
administrators. These are based on proprietary command 
languages' , present information in a low-level format, and 
are unforgiving of mistakes. Furthermore, to access 

' Although the vendors of the most commonly used SQL 
databases (i.e., Sybase, Oracle, Informix) adhere to the SQL 
standard, they use incompatible tools, piogrammadc 
inter&ces, and command languages. 
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databases at remote sites, scientists must leaiii additional 
comiectionutilitxes (e.g., telnet, ftp). 

Third, SQL itself is terse, error prone, and unfticndly. 
It is easy to misspell a keyword or data name, omit a "join" 
clause, or worse, construct a query that returns unintended 
results (Sme95]. We believe it is senseless to expect 
scientists — ^who have neither the time nor the desire to be 
experts in computer science — to become proficient in SQL 
progranuning and database theory. 

Last, there is a lack of meaningful feedback during the 
query process [JV85]. The chances of a casual user 
formulating a query correctly on the first attcn^>t are slim. 
With command-line SQL, the user must bear the burden of 
determining the exact nature of an error.. Messages are 
cryptic and generally do not reveal the source of the 
problem. 

Since SQL*s inception in the 1970s, database vendors 
have virtually ignored the scientific community, instead 
building support tools for business and financial users (i.e., 
inventory, accounting, stock analysis). The need:ifor query 
assistance tools, targeting scientific users has been 
expressed by researchers at number of biocomputing 
workshops and conferences [FJP90, ZI94]. To fill the 
need, we worked closely with scientists from the 
Department of Botany, and Plant Pathology to develop 
HyperSQL. 

HyperSQL is a development tool for constructing Web- 
based queiy intdfaces to SQL databases. Our goal is to 
make it easier for scientists to retrieve. inforn^oii.;&om 
remote scientific databases using common Web browsm 
O^ietscape, Mosaic, Internet Explorer), which lack the 
capability to interoperate dir<5cdy with SQL databases. 
Our software makes it possible to layer browser-based 
query interfaces on top of normal SQL. No modifications 
to either the Web browser or the SQL database are 
required. 

With ' our software, forms and hypertext-based 
interfaces eliminate direct exposure to SQL and low-level 
database tools. The user queries SQL datafbases by 
entering search criteria into forms. No Iqiowledg^; oif SQL 
is required because the interface automatically, generates 
SQL queries fi-om the user's input. Since the form is 
largely independent of SQL. it can use discipline-specific 
(rather than computer-related) terminology. Hie results of 
the query are formatted as hypertext, , so they can include 
hyperlinks that can be selecteii to browse the database for 
related information, bring up other Web .pages, or access 
remote search engines. 

For query assistance tools, browsers offer several 
important features. They employ a conceptually simple 
hypertext model that even users with little or no computer 
training can easily learn and use. The browser serves to 
replace a number of low-level tools (i.e., telnet, ftp, 
gopher) and provide unified means of displaying data in a 



variety of formats (Le., ta^V^tablw^ im^es, soiuid,; 
movies); furnish the featuf^ ii^^ 
fix)nt-ends— t^les- and foim^ ^(i.ei,; ;t^^^ push 
buttons, radio lights, pulldown menus; scxblled lists)-^d 
operate independendy of : both: location and comp)iter 
platform. Hnally, they are available, fp^ free or at a low 
cost The remainder of this paper. is^Cffj^ahized as follows. 
Section 2 demonstrates hbw Hj^riSQL interfaces are 
used; the following scction shows how/ they are 
constructed. Section 4 . shows , the organization of query 
files, while Section 5 discusses the architecture and how 
HyperSQL is iniplemented. Section 6"sunmiarizcs related 
work by others. The last section discusses the cunenl 
status of HyperSQL and our. fiiture'plans. 

2. How H3i)crSQL Q^^ry interfaces are Used 

HyperSQL query interfaces areJ^wnstiTi^^ inserting 
a small set x>f HyperSQL :dcsqip^ 
HTML (Hypertext Mauiu 
text files. No compilafibn i^-:neC4^^ 
arc intwpreti^d and cim our" spo;:i^^ 

positioned between th6 : rcniptB da6^ tto wser 

software. Operation of ffie 'gajt6^ 
lisers need not be aware^^of i&ipjesence^^^ * 

A HyperSQL query int^ace' b coi^ 
menu, qaei^ fomi,: r^Mis -s^tk^^ 
browse screens (Figure l). ^;:%e §^ 
starting p<Mht foir . pii^ntm^^ 
inforination [about the dk^bfs^ ^d :d^daymg a : ii;^^^ 
queries, organized by siibj^t^Cl^^ 
selections ■ brings lip ; '^Ijq^ii^^ : . 

auiomitically '^mpts tfi^ :\^;fQr a;paKWoi^^^ bneVis 
required to gfin acccM to th6 dstfa^ J ; ' ; 

Queryileoa --; ^tfQaj»UWuii;.op^^ • - 
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Figure L Operadpnid mc^ HyperSQL query 
uOerface 
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Querylists 
presents the user 
with a list of 
"allowed" values 
from the database 



The wildcard 
character 
matches all data 
values 



Clears the form 
and restores 
default values 



Submits the form 
to the database 



Sets the line limit 
for database 



Brings up a 
keyword search - 




Figure 2. Example of a query form (shown in a Netscape browser) 



The user eaters search criteria on the query form 
(Figure 2). He/she can fill in as much informatiott as 
available, or none at all; by default all text fields are 
supplied with wild-cads meaning "Wtch 

everything." A row of standard buttons are automatically 
provided at the bottom of the fonn. The Retrieve 
button submits the form and returns results. Clear erases 
any information entered into the form, replacing them with 
default values. 

The pulldown menu located next to the Clear button 
is used to limit the number of results returned from the 
query; by default output is limited to 25 records. Clicking 
the Keyword Search button brings up a form that 
accepts a text string to be located by searching all data 
fiel(^. For later reuse, forms containing favorite or 
frequently used sets of values can be saved by selecting 
"Add to Bookmarks" (or equivalent) in the browser. 

HypcrSQL query forms supports "query refinement," 
allowing the user to fill in fields incrementally by selecting 
from "querylists" Gists of allowed values). Querylists are 
useful when the user does not know what the database 



contains, or simply wants to select from a list of values 
rather than typing. 

HyperSQL also s\q)ports keyword search, making it 
possible to locate records that contain keywoid(s) within 
any data field (Figure 2). This mechanism provides an 
altonate way to begin; results are presented in the same 
hypertext-style format as results from a database search. 
Keyword search is particulariy usefid when the information 
desired is located in unexpected places or embedded in 
descriptions, notes, or memo fields. Query results are 
displayed as a list of lieadline^'* on the results screen 
(Figure 3). Headlines are single line sununaries giving a 
general identifier for each database record that matched the 
query criteria. At this point, users may choose to explore 
one of the headlines (by clicking on it), or may return to 
the query form to revise the search. (It is always possible 
to return to the previous screen using the Back key 
provided on the browser). All forms and output screens 
can also be saved or printed using the browser's facilities. 
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Clicking phquerylinks 
returns more.det^led 
ihf6rmatl.on 



Users can browse 
additional querylinks by 
using the browser's BACK 
key to return to the 
headlines screen 
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Figure 3. Example ofareiuUs screen 
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related infonnation on 
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Figure 4i Example cf a browse screen 

Clicking . on a headline brings up a browse screen with 
more detailed information (Rgure 4). A browse screen 
also provides hyperlinks to access Web documents or other 
databases and software. In addition, HypcrSQL 
introduces a special style of link called a qiierylink 
(marked with a yellow "query" icon, as shown in Figure 4), 
which activates a pre-formulated query to obtain more 
information from the database. Results from querylinks 
arc displayed on additional browse screens. At the bottom 
of the browse screen is a Web search button that can be 
clicked to search the World Wide Web for documents and 



databaises rdated to tfae' u^ 
screen. * - ' v " ' , . 

3. How Hyp^SQL Qiiery Interfaces af^^^^ - 
Constructed 



This sectibn presents an overview of., how. HyperSQL 
query intafaces are constocted;.. llie. first subsection 
discusses how qaeiy.-fonns^^^^^^^^ 
subsection explaining ho W; . SQL is generated . fromj ihe 
query forms. The final subsectiop. describe how oUtput. 
is fonnatted on resultis and browse screens. 
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Figure 5* Layout of a sample query form 



3.1 Biulding a Query Form 

Qaery forms are composed of six sections: banner^ 
header, form body, standard buttons, and a footer; Figure 
5 shows where these regions appear in the example query 
form. The layout of the query fonn is specified in the 
query file, using ''query form" descriptors [NPHM96]. As 
their names suggest, banner, header, and footer 
descriptors are used to define what should appear in 
predefined decoration areas. Since the decoration 
descriptors accommodatff any HTML text, the 
programmer can include hyperlinks, graphical imagemaps, 
tables, and formatted text in those areas. If a password is 
required for the database, HyperSQL automatically 
prompts for the password before bringing up the query 
screen. 

The banner area is located at the top of the screen. It 
typically displays a logo or database name and is used on 
all screens in the inter&oe. In contrast, the header Gocated 
below the banner) usually varies to reflect the nature of the 
information on each screen. 

The form body section specifies what prompts and data 
entry fields should be presented to the user. HyperSQL's 
INPUT descriptor offers a variety of styles, including 
radio lights (lists of choices controlled by diamond-shaped 
buttons), buttons, pulldown menus, and queryUsts 
(scrollable lists that are filled with information acquired 
automatically from the database), as well as simple text 
entry fields. A row of standard buttons automatically 
appears below the form body. The Auction of the buttons 
was described in Section!. 



The footer area, located at the bottom of the query 
form, can be used for a variety of purposes: to display 
links to the query menu, or related pages; to identify the 
author and veraion of die software; or to fimiish a button 
for sending feedback to the developm. 

3wK How SQL is Generated 

When the user clicks die Retrieve button on tiie 
query form« the contents of die form are seint to the 
HyperSQL interpreter for procesang. HyperSQL first 
establishes a connection to the remote database, using 
information from the environment section of the query, file. 
(Since connections are short-lived, HyperSQL logs into the 
host computer and database for every query submission.) 
Dq)ending on options specified in die '*SQL** section of 
the query file, HyperSQL can either (a) invoke a specified 
SQL procedure already stored in the database*s procedure 
cache, using a list of values from die query form, or (b) 
dynamically compose SQL code from the user mput and 
the set of query directives specified in the query file. Aiter 
transmitting the code to the da t abase, HyperSQL awaits 
the results. 

Errors from the underlying database and the network 
layer are trapped automatically and displayed prominendy 
at the top of die screen. For debugging, HyperSQL 
provides a debugging descriptor which can be activated to 
display all SQL code generated by die HyperSQL 
interpreter. 
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Figure d. Layout of a sample browse screen 



3.3 Formatting Database Output 

Query results are fonnatted as hypertext, using'^'output" 
descriptors specified in the query file. There iare two types 
of output screens, results and browse, controlled by 
separate sections of the query file. A results screen 
prcfsents a list of single Unes, summarizing^ the results 
returned firom the database or keyword sesffch engine. 
Data items in the ou^ut .which cross-reference other 
infonnation in the database are formatted as querylinks, a 
special kind of hyperlink designed to retrieve related or 
more detailed information from the database. 

When the user selects a qucrylink, HyperSQL 
automatically invokes the associated SQL procedures with 
selected querylinks, passing the specified data fields as 
parameters. Since all querylinks on a. results screen are 
statically associated with the isame set of SQL procedures, 
the code is pre-stored in the database's procedure cache to 
improve response time. Browse screens display the results 
from following querylinks. 

The same output form descriptors are used for both 
results and browse screens: banner, header^ form body, 
and footer. Figure 6 show these regions on an example 
output screen. Note that output forms are very similar to 
query forms, except that the standard buttons are different 
and the form body displays output instead of blanks for 
entering input. 



4. Query-FUe Organizatiofb 

. A HypwSQL-based query iiiterfjfce ^: comp ' 
set of text files defining /the screens. Tlie qu«y \^ 
which contains piUy HTI^ 
qindex.htunl. reiMinider 6^; ;filc»..|^^ 
query files and their nanu^{i^y6. ^sjc^^^ 
collection th^y prpvideconpiple^ 
interface (e^g., Bac^um.j^eryj^|Fu^^^^^^ 
contains, ^a^ (^inbuiid^ 
embedded :HTMUpfganiz^ 

. A (juay. file must include /^actiy^^ pf':>.die 
environment, query, and ^SQL secticuis, . pl^^ \ or rnorc; 
output (results and brbwse) v scctiphsr The sciqtions, 
delimited<by:BEGIN and ENp.^^utemeiit^^ in 

1 . jENVniONMiENT:. i ■ con^ins.'^ . ':mfpfm2iti'on> i i for 
connecting to the target database andisete. the';bannfflra; .for' 

' all interface screens. • ' 

2. :QUERy: describes^ laybut^^^^^ 

specifies formatting forthe formieadiCTi input fields orfi^^^ 
form body, and the footer;^' 

3. .SQL: controls how^ SQL4c<We^^ 

information, entered on the . iqueryVfcnmL '>^tem^tiyeiy; Uiis^ 

sectibn can invoke a SQL^rocedure 

or im external script or pro;^ami?'' '-" 
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4. RESULTS: describes the layout of query output 5. BROWSE: describes the layout of screens containing 
providing a list of result records. It specifies formatting .detailed information. It q)ecifies formatting of a header, 
for a header, query results, and footer. data fields, and footer. 

ENVIRONMESrr BEGIN 

USEOATABASE mgd LOGINsMGO PASSWOHD^PROMPT; 

BANNER •<center><iing 6rc=/H7P£RSQL/gif /fflgd.gi£></center>'; 

QOZRY BEOZN 

HEADER *<center> <K2>Bacteriuzn Query</H2x/center>" ; 
INPRINT "<B>Bacterixim Name</B><br>' ? 

INPRINT " Genus INPUT gen\is TYPEsTEXTINPUT DEFAULT* • Es cherichia " ; 

INPRINT "Species ■; INPUT species TyPE=TEXTINPOT DEFAULT="**; 

FOOTER *<hr>Feedback to ingd0izigd.cordley.orst.edu*; 
QUEKX ESD 
SQIi BS6XB 

SUB genus WHERELIST AS upper (Organism, genus) like upper('$'); 
SUB species WHERELIST AS upper (Organism, species) like upper ( *$'); 
FRQMLIST Organism, Keadline^org; 

SELECTLIST Headline_org.mgd^enDplasixL.recor^n\m, headline; 

WHERELIST Organism. mgdLgenqplasnL.z'ecord^num « Headline_org.mgdu.gemplasm_record_num 
AND Organism. gexmplasnutype « 'Bacteria*; 

SQL END 
RESULTS BEQIN 

HEADER "<hl><center>Bacterium Headlines</center></hl>* ; 

OUTPRINT ■<br>DATA^FIELD=mgd_gerni)lasnurecorcLnum DATA^IELD=headline' ; 

LI NK h eadline TO mgdGetBacterium(mgd»germplasitLjrecord^um) BR0WSE_SCREEN=1 ; 

FOOTER " <hr>Feedback to mgd9mgd.cordley.or6t.edu"; 
RESULTS END 
BROWSE 1 BEGIN 

HEADER •<hl><center>Bacterixan Screen</center></hl>"; 

OUTPRINT SUPPRESS_IF_EMPTY •<li>Genus: <b>DATAu.FIELD»Genus</b>" ; 

OUTPRINT SUPPRESS_IF..^MPTy "<li>Species :<b>DATA^IELD=Specie8</b>' ; 

OUTPRINT ■ DATA_PIELD=Organi sinjlole : <b>DATAJFIELD=Res eeu:cher_o<br > ■ ; 

OUTPRINT ■ DATA^IELD=Collection^ole : <b>DATA_FIELD«Researcher_c<br> ■ ; 

FOOTER *<hr>Feedback to mgd9mgd.cordley.or8t.edu*; 
BROWSE 1 KHD 

Figure 7. Sample HyperSQJL query file 

5. HyperSQL Implementation 

Rigure 8 shows the architecture of HyperSQL. The 
software components of a HyperSQL query interface 
include the target Sybase database, die HyperSQL 
gateway, an HTTP (Hypertext Transfer Protocol) daemon, 
a set of screens (displayed by the browser), and an end- 
user, assunoed to be a scientist or other person who may be 
unfamiliar with SQL and the target database. The initial 
version of HyperSQL supports databases using Sybase 
System 10 [SybSMb], a de facto standard among large-scale 
scientific databases. Sybase provides multi-database 
management capabilities and a programmatic interface 
based on SQL and the Open Client Library [Syb94a). a 
standard network communication protocol, llie Open 
Client architecture makes it possible for the HyperSQL 
gateway and target Sybase databases to reside on different 
computers across the Internet However, other ijpfafrasn 
vendors have not adopted the interface and we plan to 
rewrite the conununication interface using DBI (Database 
Interface) [Bun95}, a freeware effort under development 
by a number of third-party database progranuners. By 



taking advantage of DBI we will be able to support other 
databases, including Oracle, Mormui, INGRES, mSQL, 
Empress, C-ISAM, DB2, (^uickbase, and Interbase. 
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The core of the HyperSQL software is the HyperSQL 
gateway^ composed of an . interpreter and a 
communications back^nd. The interpreter carries out the 
database operations described in the query files, 
transparently composing SQL queries, establishing RPCs 
(Remote Procedure Calls) to the remote databases, and 
formatting database results according to query file 
specifications (Figure 9). The back-end module contains 
interface to the Open Client Library. Other 
communications drivers, such as the Microsoft's Open 
Database Connectivity and OpenLink's UDBC (Universal 
Database Connectivity) [Ope95] can be incorporated into 
the gateway by writing a replacement back-end. 

The number of HyperSQL gateways ,in operation at a 
given time is limited only by host computer memory, the 
number of processes running, and the number of users 
simultaneously logged into the database. Each database 
transaction (submitting a query form or clicking on a 
querylink) invokes a new copy of the gateway, which 
terminates automatically after processing the query results. 
This style of interaction was chosen over maintaining a 
continuous login session, because it is not clear ill a 
browser-based interface when a given user-session ends 
(t.e., interface screens may reside in the browser's cache 
indefinitely). 

An HTTP daemon [NCSA95), available from NCSA 
and other sources, processes browsers* HTTP (Hypertext 
Transfer Protocol) requests, received on a dedicated 
machine port (80 hex). The daemon automatically calls 
the gateway vAitn HyperSQL services are required to build 
a screen or submit a query. There is a restriction that the 
daemon and gateway reside on the same mabhiae, because 
the . daemon invokes the gateway, as a UNIX child process. 
The query file must also be accessible to the! gateway 
software. 

6- Related Work 

Tool developers have constructed a variety of toolkits 
for constructing browser-based, interfaces to remote 
databases. We have classified the toolkits into three 
categories, based on how query interfaces are specified: 
(1) interoperability languages (GSQL. WDB. WebinTool, 
W3-MSQL). (2) schema-based tools (Zelig, GeneraAVeb. 
UMASS Information Navigator), and (3) GUI-based 
environments (Cold Fusion, dbWeb, Sapphire). 

6.1 Interoperability languages 

Several toolkit developers use a language approach to 
interfacing Web browsers to remote databases. 
Developers construct query , files by >embcading special 
directives into HTML (Hypertext Markup Language) text 
files. Since all of the languages provide minimal features 
for building forms and generating queries, they differ 
primarily in number of features, syntax, and programming 
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Figure 9, How CGI:baied 4ue^^ . • 

••friendliness". The tools employ' a specials Cffl interpreter 
to intercept the directives, carry out the -corresponding 
database operations, and format database* results ipto 
HTML. Figure 9 iUustratf^ihow^isijj)^^ 
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(where users enter query constraints) and format query 
results. The definition files permit embedded HTML» as 
well as calls to Peri functions, for additional formatting. 
WDB also supplies a utility for extracting the database 
schema from the database and automatically creating a 
working template form definition file. Lilce GSQL, the 
software must reside on the same machine as the database. 
WDB currently supports Sybase, Informix, and Mini-SQL, 
a freely avail^le database which implements a subset of 
SQL. 

WebinTool [Ins95] query interfaces are composed of a 
web page template and a set of macro directives (prefixed 
by periods), embedded directly into HTML text files. 
WebinTool ofifcrs a number of enhancements to GSQL: 
arithmetic operations, variable translation (i.e., uppercase, 
lowercase, escape, unescape), control directives (JF 
£LSEIP JELSE, .STOP, j\BORT) and file directives for 
reading in a file for display ( JIEADFILE). WcbinToors 
GSQI^style query directives (.SELECT, JROM, 
.WHERE, .UNION, .SORT) specify how SQL is to be 
generated from user input from query forms. References 
to local and output variables are prefixed with a csb-style 
•$'; a different prefix, is used to dereference input 
variables. 

Improving on GSQL, WebinTool provides 
comprehensive support for formatting database output 
Tcn^late directives (.TPLDEF ... .ENDTPL) specify how 
ou^iut column variables are to be printed (lefk/rig}it 
adjusted; wmYimmn length). Data items in the query 
results can be linked to related information in the database 
(.ARG UNK). The cunent version of WebinTool is 
restricted to machine containing the database software. It 
supports INGRES, Informix and Mini-SQL, and can 
si^yport other SQL-based databases by replacing the 
backend. 

Of all the GSQL-style languages, W3-MSQL [Hug95] 
contains the smallest number of descriptors. Programmers 
must understand UNIX and C-s^ coxicq;»ts; for example, 
file seek and fietdi (iterations and bow to allocate and free 
handles. Unlike most of the other GSQL-s^le languages, 
W3-MSQL provides an IT construct included primarily 
to alter ou^ut formatting based on the values of data fields 
returned firom the database. W3-MSQL supports Mini- 
SQL databases and, like the other tools, must be executed 
from the same machine running the database software. 

62 Schema-based tools 

Each of the tools described in this section is driven by 
programmer-supplied '^chemas,'* although the meaning of 
the term is used some'wiiat differently in each case. Zelig 
[VH94] uses "'schema-based** specifications to guide 
forms generation. Here '^hema'* refers to a set of 
directives embedded into HTML files (coded as 
comments). The authors take a novel approach to 



monitoring uso* behavior in query interfaces. They embed 
an expert syston, written in OPSS, into the interface to 
accumulate statistics on database access and monitor 
interface usage patterns. The rule-based module offera the 
database administrator advice on modifications to the 
underlying data structures for optimizing access tune and 
storage requirements of the database system. 

Another schema-based tool is Genera/Web [Let94]. 
Using Genera, developers specify query interfaces entirely 
in the tool's proprietary object-oriented schema language 
by defining and expressing relationships between 
''entities^ groups of related information in the database. 
An entity is a collection of fields fiom one or more tables, 
views, or other entities, grouped together in a logical set 
Fields are permitted to contain links to other databases. 
From the schema. Genera's compiler generates the HTML 
code for the query interface and the corresponding SQL 
procedures to be invoked by the query interface. 

Genera, designed for Sybase databases, eliminates the 
need for developen to code directly in SQL and HTML, 
but at the expense of having to learn a complex and 
proprietary sdiema language. For developiers who are 
already proficient with SQL, Genera introduces an 
unnecessary level of abstraction, which can be fivstrating 
^en the SQL generated from the schema produces 
undesixed results. Althouglh developers can edit the 
generated SQL stored in the database, Genera 
automatically overwrites all of the SQL code any time the 
conn^iler is invoked— even when simple cosmetic changes 
- are made to the interface. 

The tfiird schema4>ased tool is die UMASS Information 
Navigator, [Hud9S] designed to support easy navigation of 
relational databases. Hudson's tool generates query 
forms fiom a meta-data "sdiema," which contains 
information about the organization of the database. After 
entering search criteria into forms, the tool returns results 
which in turn can be clicked on to bring up more detailed 
information in the local database, on odier Web pages, or 
from other databases. Tlie Navigator is ideal for storing an 
entire Web-based information system. Because it 
generates Web pages dynamically, the Navigator avoids 
the *^e link^ problem, a major source of frustration of 
Webusers. 

A configuration file contains the opening screen 
information, including the views contained in the meta- 
database. The Navigator, which supports both Oracle and 
Basis4>, also permits users to use the results of relational 
queries as starting points for searehing using fiill-text 
retrieval en^nes such as WAIS or INQUERY^ 

63 Query Interface Constniction Tools 

The tools presented in this section are designed to assist 
programme in furnishing Web access to ODBC- 
compliant (Open Database Connecdvity) software— which 
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includes SQL databases and spreadsheets — on Windows 
95 and NT platfonns. (Siinilar tools are not yet available 
on the UNIX platfonn.) 

Cold Fusion [A1195) is a CGI-based tool that reads 
template files to build query interfaces to Windows-based 
databases. In response to user queries, Cold Fusion 
creates dynamic HTML pages by mixing HTML tags and . 
Database Markup Language (DBML) from the templates. 
DBML dictates how queries are formulated and database 
results are displayed. Using query interfaces created with 
Cold Fusion, end-users can both update and query 
databases by interacting with forms di^layed on browsers. 
Cold Fusion provides a number of features targeting 
financial applications, including special formatting of 
currency, date, and time fields. 

Aspect Software Engineering's dbWcb [Lau95] is a 
data-driven gateway between ODBC data sources and NT- 
based web servers implemeated as a multithreaded NT 
service. Forms can be created for both updating and 
retrieving information from databases using dbWeb's 
administration tools. The tools generate a combination of 
HTML and special *tags," (similar to Cold Fusion's 
DBML) which specify the database operations to be 
performed by the gateway. At . runtime, the gateway 
intercepts the tags, performs the specified database 
operations, and wraps the results in HTML for the 
browser. Database information which contains pointers to 
related data in the database are formatted as ''smartiinks*" 
in the browser, permitting iisers to navigate and browse Ae 
database by clicking links. 

; Bluestone's Sapphire/Web [Blu95], is a GUI-based 
programming tool for . creating : Web^based database 
interfaces for the three miy or SQL databases: Sybase, 
Oracle, iand Informix. Sapphire, available for both the X 
Window System (not yet released) and Windows 95/NT, is 
similar in operation to Powersoft's :PowerBuilder, in that 
both tools permit the user to build a database interface 
using direct manipulation. The tool generates C/C++ 
source code which can be . modified or custonuzed by the 
intCTf ace developer. 

Sapphire provides no direct support for designing 
HTML-based forms. Developers can create HTML fonns 
manually using a text editor or by using HTML authoring 
tools such OS HotMetali WcbMagic, or the Intcmct 
assistant built into Microsoft Word.' -Applicatipn "objects" 
such as stored database procedures, dynamic SQL, 
functions, executables, files, and other objects (i.e., OLE) 
are dragged and dropped into the **Bind Editor," where 
HTML elements such as text input fields or drop-down 
menus are "bound** to arguments; database results are also 
bound to HTML elements for formatting. Once the 
interface is defined, the tool generates C and C++i which 
can be customized manually by programmers, who. can 
introduce conditional processing or- change the default 



method of populadrig^ tlfe •";dj^"; • ifUlSSSi^^^^ 

does- not • eitirelyfeliibii^ :itfj^ie?i^iiff^^ 

convenient'meaii^ t)f -assmbi^^ 

bas6d database iiiterfacesK J ' ; 

When we started oui^Wc^k;:i6^ G5 
existed. Although GSQti deinonifeat&i^^ . 
coiiid intcropcraifc widi felatib^^^ 

-to simple' input forms;^^ Ikcki^'c^ forn^tting'a^^ 
cross^refereiK?ing ouq)ut^t6^ relafbd' ii^^ aiiitivas 
unsupported. The dravfeack to . Gen was ^ having to 
learn its high-level y^chtm yariguajge; 4he'' tbpl 
automatically generates both the in^rficc^^ 
hig^ degree of automation his a draw^ 
are made for queries for which &e precise SQL^i^^^^ . 
known; the prograduhef iriiist ufirieces^^ translate "SQL 
into schema language. ^ 

Furthermore; most of -tfe' subseqiiciStiy ^ 
we have 'examined' ^dplnpt -fi^ 
requirements foi publishing 
scientific databases, wHich axe ^(ihtionaiiy 
end UNIX work^tafidii^s -^Irf^psbli&dl^ . * 

• do not conncct: to feflioife 
the mkchihe niinaiin 

beyond a POh?«^ local;areai network; [ 

• do not help! users: 

displaying alik of aifow^ , ; •/ 

• fumish^o means oCiesjpct^g^ia^^ 

(e.g., password piptectipn);:^ ; • ^ - ^ f:: : 

• . lack ineans . ^of aoss^ieferei^ 
related-data i^;;^<iiui^i^^ 
scan for MOTtioiiatiito"^^^ 

At the time this pap^ :was /grep^"?^^^ 
been succefssfiil in buildm|;-q^ tn'^p^rt o^^^ 

tiuw inaj6ribiologi<il da^^ 

• Microbial 
^eari^le uii^jit^^ 
"gdrmplas^ rci^intidi^ 

'"^jthroughoutthe^U;?^ ''^'::S' 5!^■^;^.:^/ ^ 

• MycolbgicdTVp^s 0tC) [PiiS96]h -r:- 
searchable d^diptioas of fu9gi;ty^ -f:', 

• Vascular PlantsT^p^ CiU#^ 
searchable descriptions of vascd 

These databases have^een ^cc^^ i:Qm m6Tt i^ 
countries, including- Brazil; ^ CaM^i^ Gennafiy , { Russiia; 
:Sweden, and the tJSi^r ^ F^bask- fi:oi|n^i)rpgraftt^ 
-database . endrus^' have ^b<een i^'no^iuigingi Additional 
databases are expected tolicome onlto^^ 
in the near future. rJW^arc also^cons^ering^^aptmg 
HyperSQL to support pii£ic.W>f^^ Satliiascs 
using DBi;(Databiase Interf&c)^^ : v^r;- : 



338 



We have begun development of a next-generation Web 
tool called QueryDesigner, which wiU enable non 
cnm p trtin g prof^ssionab to lo:ate. cooaIcci, buJj 
personalized queiy interfaces to remote SQL databases. 
No knowledge of SQL, HTML, or HyperSQL will be 
required. Current information on QueryDesigner and 
HyperSQL, plus a list of supported databases are available 
at http : / /mgd . cordley . ors t . edu/hyperSQL. 
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Abstract 

We present a web-based course system for teaching 
and training the industry employee. This system is 
designed to synchronize and retrieve the data related to 
HTML lecture notes and lecmre video. The overall system 
consists of two frameworks, the Web-based Synchronized 
Multimedia Lecture (WSML) framework and the XML- 
Based Synchronized Hypcnncdia Video Query (XSHVQ) 
frameworic The WSML system synchronizes the 
presentation of the streaming video lecture, the HTML- 
based lecture notes and the HTML Navigation Events. It 
also automates the recording of these three media events . 
based on the Synchronized Multimedia Integration 
i Language (SMIL) specification. To create an online 
^ course using this system, teachers first conv rft thnir Iprtiinv 
notes mto fee HTML pa^es and present the HTML-base d 
lectures in a digital stu dio Then the synchroni zed video^ 
^ clips and the associated HTML^ased notes are detwsited 
in our course database for on-demand self-paced learning . 
The XSHVQ system is designed to efficiently search for^ 
^e desired video segmcpts based on the synchronizatioi r 

based notes. To query desir* *^ yf^l^" ^^'^ff^^rn^ H^ing 
jjlSHVQ ^stem, users simply use ''keyword"jgarch for the 
H TML lecture notes and the sys tem can locate the 
c oTrcsponding video segments and present them to useni. • 

L IntroduetloD 

As a result of emerging technologies for multimedia 
data processing, ''WWW-based online courses" is 
becoming popular for distance education [1-7] and can 
serve as a major manner for industry employee training. 
Currently, most online courses provide either unguided 
HTML pages or course video with blurred images of 
lecture notes (e.g., RealVideo video streaming systems 
shown at Fig. 1). Both distance lecture modes have some 
inevitable downside. Aiming to &cilitate the authoring, 
depositing, and presenting of online course lecture 
material, we have endeavored to develop various 
synchronized multimedia lecture systems. In our approach, 
we use streaming video for the AA^ lecture, and use the 
dynamically loaded HTML pages to present die associate 
lecture notes since HTML pages can present static text and 
images in much hig^ resolution than that presented by a 
low bit-rate video clip. 




Figure 1. Diitortcd Inagc of RcjdVideo at 16Kbp< 

In this paper, we propose two frameworks, Web- 
based Synchronized Multimedia Lecture (WSML) 
framework and XML-Based Synchronized Hypennedia 
Video Query (XSHVQ) framework, for the purposes of 
data synchronization and query, respectively. The WSML 
ji^ework synchronizes the presentation of the streag upg 
video lecture, ih g-HIMUhas^ lecture "^^^'^ 
HTML Navigation F\ fi m It nha n iirnTr«»**° r^nrrim^ 
of these three media events based on the Syndux)ni2ed 
Multimedia Integration Language [32] (SMIL) 
specification. To create an online course using this system, 
the teachers fint convert their lecture noio ^ FT^- 
pages in advance. Thev present the H iML-based lectures 
m a digital studio equipped with camcorder/digitizer for 
AV lecture recording' and digitizing. JTbsn — thd 
^n dironized video xJ i p^ - ^nH thf! a^^ ^^riated HTML-hasyl 

notes are deno s ^ted in o ur On^'V^ C^nrcP Hflfaha^?^ fnr ftn, 

d^lP ffid sg|f -p^cej l^q^g The implementation of the 
WSML frameworic will be introduced m Section 2. 

As the amount of online course database increases, 
efficient search for the desired video segments becomes an 
important issue. There were various approaches that hav s- 

based' video query [8 *12). These methods basically apply 
signal processing or inuige processing technologies, for 
example ''scene^change detection** dr '^ddeo segmentation** 
tedmologies, to ejctract' the desired video segments. 
However, most video clips of course lecture in our 
database contain no significant "scene-change", and 
thereby current content-based video query algorithms may 
not be applic^Ie. In this paper, wc have developed the 
XSHVQ framework on top of WSML system to address 
this issue. We apply f ^fi "Mfitf>^«*q" concept \U-\6] in our 
^system fo r structural descrip tion and management of the_ 
^WWW lecnire video artd hlML lecture notes in om 
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course lecture database. 1^ il^ assume that tl}e 
jynchropiTatiftn hetween the video clins 



correspondin2 HTML pages has been done bv the WSM L 
This synchronizati(Mi infonnation provides a very 
convenient and efficient way to index the video segments 
and can be built into a look-up table to represent the 
associated correspondence s. To query desired video 
se gments using XSH VQ framework, users simpiv use 
"k eyword" search foFTtl^ mift^j lectlJr no )ys and the 
system can locate the corresponding vid eo segments ^ a 
lo ok-up table and preSent ttiftm to UStihi. Ihe undgijyiDg, 
conce pt and Imple menlation ol (lie XSML sySCenTwill be 
described In seciloj " 



2, Web-based Synchronized Multimedia Lecture 
Framework 

Our Web-based, Synchronized AV/HTML 
Navigation Distance Lecturing Framework consists of 
three major modules (as shown in Fig. 2): (1) WSML 
Recorder- for recording the temporal information of the 
AV lecture and the HTML-based lecture notes navigation 
process. (2) WSML Event Server- for receiving, 
depositing, and multicasting/unicasting all WSML events. 
WSML Browser- for synchronized presentation of t he 



AV lecture and HTML-based lecture notes navigat ion. 



"«A Sow I f— r 
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Figure 2. Framework of the proposed WSML 

2.1 WSML Recorder 

The major function of the WSML Recorder is to 
record the video together with the associated events during 
die lecture. The recorder consists of (1) Timer, (2) AV 
Encoder, (3) Navigation Event Logger (as shown in Fig. 
3). 

Timer to initialize the recording process. Each time 
the teacher starts the recorder, tiie timer will be initiated 
and become the time-l'me of the WSML. At the same time, 
the timer will automatically activate the AV Encoder and 
the Navigation Event Logger. 

AV Encoder to encode the AV lecture. It also sends 
and saves the compressed AV lecture to the AV server for 
subsequent live broadcasting or lecture on demand. 



Navigation Event Logger to record the temporal 
infonnation of the AV and the HTML navigation events. 
The major events occurred during the HTML-based lecture 
notes navigation might inchide the mouse event, and the 
URL event (illustrated in Fig. 4). Those events induced in 
the teacher site will be sent to the Navigation Events 
Logger for recording. The mouse events include 
information regarding mouse drag coordinate, highlighted 
region, and the scrolling of&et The URL event may occurs 
at the moments the teacher inputs URL, loads a HTML by 
pressing a hyperlink, browses badcward, or browses 
forward. The Navigation Event Logging process is 
exemplified in Fig. 4. 

2.2 WSML Event Server 

Fig. 5 illustrates the interactions among the WSML 
Recorder. WSML Event Server, and WSML Browser. The 
WSML server is responsible for the receiving, depositing, 
and transmitting of the WSML events. In the live 
broadcasting lecture mode, The WSML Event Server 
receives various WSML events from the Navigation Events 
Logger, and then broadcast to the WSML Browser in the 
client sites. At the same time, those events wiU be encoded 
(by the SMIL gateway) into SMIL 1.0 compliant format to 
be deposited in the SMIL database for on-demand requests. 




AVURL }evett 1 urlEvwh 




HTML 



Figure 3. Coopooents of the WSML Etccorder 
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Figore 4. An cveot logslog e i omp l e. 
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Figorc S. The dcUiled intcractioos among tbe WSML Recorder, 
WSML Event Server and WSML Browser 

23 WSML Browser 

The WSML Browser is for presenting the AV lecture 
and replay the corresponding HTML navigation process 
based on the information broadcast &om the WSML Event 
Server (as shown in Fig. 6) ^The major components o f 
y/KML Browser include WSML Reader , fm* 
AV play er and HTML browse r. Take the WSML events 
ilTusfaided m Fig. 4 as an example, the WSML Browser 
will present the WSML in die following sequence. 




Figure 6. Framework for the WSML Browacr 



3. The XML-Based Synchronized Hypermedia Video 
Query (XSHVQ) Framework 

To allow ^ *rprs to irrarrh effic i ent l y f o r rolovon t 
lecture video segmep »f iif'^p; v^ywnrAJ^i^Kt-A g^HPory we 
n rr d flrar l y 't^fiiv « w \^tinnnhi^r h^turftmi tha 
keyword a nd video segments. In our framework, jy e 
propose PTe "metadata" concept to ^^ ^ribfi 




Figure 7. The XSHVQ Fnmework 
On the other hand, as the contents of the online 
couree continue to grow, the metadata for the online course 
hypermedia will become more complicated. In our design, 
concerning that the metadata is tree-structured data, we use 
the XML [17-19], which is a meta language ideally suited 
for describing the tree-structured data, to define the 
synchronization meta information. Based on tibe concept 
addressed abov e^ we nroposed XM ^-^^^^^ <}V"^^^'^^ 
Hvnermedia ViBeo Query CXSHVO) framework for th e 
vi deo query in our WSML System addressed above . In tnc 
"pr oposed XSHVQ framework, we apply XML-b^ed meta 
i pformation to structurally describe and o rganize me" 



synchronization relationships those relevant media. ,3^ 
defined the metadata tor descnomg the temporal 
jelationships between the "Keyword to HTML notes" a nd 
"HTML notes to Video Segments". 



WWW lecture video and HTML lecmre notes in our 
course lecture databa se. As shown in Fig. 7, d>e XSHVQ 
framewortc contaxns^^aur cnicial components: 

1. Ay/HTML Synchronized Inforroa^ s^" n^^rtf or* fpf 

loggm gtfie svnch i^n^Tffl^Qq iy^rnrmarinn hetwegn the 

Tecfini video and the associated HTML-based lecture. 

2. giMTi KfJT^'*-^ A-nii^rrAr* fnr ifftywnH filtering and 
weighti ng from the texts in the HTML lecture note s. 

3. Query Server: for generating and processing die index 
files ofthe XML data. 

4. AV/HTML Mergen for integrating the video 
segments and the corresponding HTML-based lecture 
note for presentation. 

3.L AV/HTML Synchronized Information Recorder 

The AV/HTML Synchronization Information 
Recorder is used to automatically log the synchronization 
information of lecture video segments and associated 
HTML lecture notes. The synchronization information is 
formatted according to the XML syntax and deposited into 
the Synchronization Table (c.f Fig. 8). During the 
playback, die system presents to end users die AV 
segments together with the synchronized HTML pages 
based on die synchronization infoimation specified inside 
the Synchrcmization Table. 
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Figure. & AV/HTML Synchronized Info. Recorder 
The format of the Synchronization Table is compliant 
to XML syntax (c.f Table 1.) The tag <HTML 
id="chapl5_r'> indicates the specific ID of the HTML 
lecture note. The tag <AV_REF idref= •*chl5_av" start = 
"OOiOOrOO** end="00:55:19"^ indicates the starting/ending 
time of the lecture video segments synchronized with this 
HTML page. 



<SYNC TABLE> 

<HTML id="chapl3 !•> 

<AV REF idrc^-chlS.av" 
ttaif^OO:00:00" end-* 00:55:19 ■/> 
</HTML> 

<HTML id-'"chapl5_2*> 

<AV_REFidrc^chl5 tv* 
mit*- 00:55:19 " end-' "/> 

</Kim> 

<HTML i<H''chapl6'> 

<AV_REFidrcf-"chl6 av" 
$tart-'00:DO.OO- end-'0I:55:34'A> 

</SYNC TABLE> 



Table 1. Tb« S^nclironization Table 

3,2. HTML Lecture Notes Keyword Analyzer 

For &e keyword-based query in the client site» we 
generate tite keyword index [20] automatically for the 
HTML-based lecture notes. First, all the texts inside the 
HTML files will be processed by (he keyword filter which 
discards the HTML tags texts and/*redundant words" such 
as **is, that, diis..." After die keywinrd filtering, the 
remained texts will be processed by the *^i^ted'* index 
generator (c.f Fig 9). The weighting is decided primarily 
according to the tags enciq>sulating the word to be mdexed. 
For cxan^le» the words inside the HTML tag <T1TLE> 
will possess higtier weighting Qctor that that inside the 
HTML heading tags (such as <H1>, <m> and 
<H6> ...etc.) 



kimlB 





The keyword index files are formatted based on the 
XML DTD and deposited m the K^rword Table (Table 2.) 
As shown in the table 2, <CHAPTER__REF 
idrcf="chapl5_r weig|it«l t> indicmes that page 
diapl5_l has^e keyword "FTP** mside with a weighting 
factor 1. 



<KEYWORD_TABLE> 
<KEYWORDid-"FTP^ 

<CHAPTER_REF iditf-"chapl5_l " wtight^l/ > 
<CHAPTER REF idre^^chapl6" weight-2/> 
</KBYWORD> 

<KEYWORDid>'IP 8ddre55"> 

<CHAFIBR REFidref«"chapl5.2"wrighi?=2/> 
<CHAPTER REF idie^chapl6" wcight^/> 
<«CEYWORD>'^ 

^KEYWORD id-Virtual nctvw)Tk*> 

<CHAFrER REF i<it^chapl5.2' weigJit=3/> 
<«EYWORI» 

</iCEYWORD TABIJ5> 



Table 2. Tbc Keyword Table 

PtoyPMcripliDd Script (ddiaed fay XML) 




Flgnrc 9. HTML K^ord Analyzer 



Figure 10. Proccdvre of tbc Query Server 

33. Query Server 

The XSHVQ firework applies the XML-based 
metadata to describe the synchronization information of the 
multimedia lecture. These metadata arc actually 
*'semistructured" and thereby difficult to be managed by 
traditional relational DBMS [21» 22]. As die lecture videos 
inoease, these XML-formatted metadata could become 
more complicated and hard to mamtain. In our project, we 
have designed a Query server, as shown in Fig. 10 JflL,j 
generate the index files o f XM^ **Tirf prnre^s the 
yML data query using die standard XML query language 
J^QLjXML Query Language) [231. 

The Svnc jTable and the Keyword-Table are the c rucial 
infdSagtisnbr students to lffl ate,jthe--4estred--vtdgg~ 
^ segjngn(5. A typical video query process is exemplified in 
T^IL As shown in figure, "FTP" is one of the keyword 
of HTML pages Chapl5-1 and Chapl6. If a user tiy to 
locate all the video segments relevant to **FTP", the system 
will first use the gT?vwnttT^x»^I^ ^"^^ ^^^^ ^^™^ 
iftct^ ^ note pagfifl mtff^*^ a K?Y^^^d "FTP". At tfie same 
time, die tjucry Server can locates the video scgmotts 
according to die Sync Table which records the 
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s ynchfonization infon patioa between the HTML lecture 
notes and J^iecture viaeo sepnaafiLJii e Vided"SgKments 
and the HTML lecture notes are then merged using the 
AV/HTML Merger (described below) for the synchronized 
sentation in the client side. 



<4acyW0RDJMfiLI > 




<inML_U8T> ^ 

Figure 11. The typical Video Query 
In our design, the uso'-specified keyword is 
transferred to the Query Ser ver ^^ro^g^ stanH arrf http 
j>roiocol . AS snown m 1(J, the Query Server then 
encodes the keyword queiy into XQL (XML Query 
Language) compliant fonnat and then transfer to the XML 
DBMS. Based on the informatiop in the Synchronization 
Table and Keyword Table, the XML DBMS then encodes 
the XQL-bascd query pattern from the Query Server into 
SQL-compliant format and then transmits to the 
Multimedia DBMS to extract the information desired. 
Aiter the HTML and Video extraction, the Queiy Server 
will generate an XML-based Play Description Script 
containing all video segments and HTML URL relevant to 
the specified Keyword. The Play Description Script will 
then transferred to the AV/HTML Merger for subsequent 
processing. 




ngure 12. AV/HTML Merger 



3.4. AV/HTML Merger 

The AV/HTML Merger is responsible for integrating 
the HTML and AV for synchronous presentation in the 
usei's site. The Merger first parses Uie XMUbased Play 
Description Script receiving from the Query Server and 
then fetches the HTML pages and Lecture Video files for 
presentation style fonnatting. Tlie Merger inp^r X<?I. 
25] to format the output into HTML pages. The detailed 



process of the AV/HTML merging process is illustrated hi 
Fig. 12. 

An example of Pl^ Description Script is illustrated in 
Table 3. ThsJlajLOe sgiptio n will sort the video segments 
nr r «amn^tinn ^aecord^S So^'Oi{^ Of the indeed 

keyword assigned in the HTML Keyword Analyzer. 



<PLAY_DESCRIPnON> 

<QUERY kcyword="protocol ed<lrcs$"A> 
<RESULT> 

<HTMLid*=^chapl$ 1"> 

<AV REFidie^^-chlS «v* 
itait='00:00:W en*- 00:5S:19 -/> 
<;/HrML> 

<H™Lid-*chapI$ 2*> 

<AV REFidfc^chlS.av- 
stai^ 00:55:19 • cwH" 01:30:33 "/> 
<yKrML> 

<HTMLi*=»*chapl6*> 

<AV REFidrcK*chl6 aV 
8tart^:00.iW «d*"01:55:34-/> 
</HTML> 
<RESULT> 
^RAY DESCRIFnON> 



Table 3. Play Description 

4 System Implementation 

To evaluate the feasibility of the proposed 
framework, we have built a prototype using RealAudio 
System [26]. The RealPlayer plug-ins embedded in the 
Netscape Navigator is used as the AV player m die WSML 
Browser. The RealServer is used as the AV server and the 
RVEncoder as the AV Encoder respectively in die WSML 
Recorder. The synchronization of AV the HTML pages is 
implemented usmg the 'Vmmerge" provided the RV 
Encoder which actually embeds the HTML URL events 
into the A V b it stream. 

We use the Java XML Parser [28] and XMJ4J [27] as 
tools to develop the system. The GAIS [20] is used to 
generate the hdex table for die HTML-based lecture notes. 
The Java Servlet technology [28] are used to generate the 
query result using tiie dynamic HTML pages to be sent to 
the client 



6)rmcbroihed Viteond HTML 




Flgnre 13. Saaptbot of the WSML System 
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figure 14. Snapsbot of tbc XSHVQ System 



Cuirently, the proposed XSHVQ system is integrated 
with our WSML system. A snapshot of both systems, 
navigation process in WSML system, and typical query 
results in XSHVQ system, are illustrated m Fig. 13 and 
Fig. 14, respectively. In this example, a user desired to 
query all the video segments with content relevant to 
^^rotocol address"*. The user used tiie keyword "protocol 
address" to query the XSHVQ systeoL The XSHVQ 
system responds ^e queiy and displaying all the HTML 
notes containing the keyword "protocol address". The user 
is then allowed to click the pages to request the HTML/AV 
for synchronoxis presentation. 

5. Conclusion 

In this paper, we present a synchnmized and 
retrievable Hypermedia course system for industry 
employee training. The WSML frameworic has been 
designed to synchronize the presentation of the streammg 
video lecture, the HTML lecture notes* and the HTML 
Navigation Events. The system automates the recording of 
those media events and significantly reduces the cost for 
producing multimedia course. Integrated with the WSML 
framework, the XSHVQ framework provides an efficient 
and query too LIt allows users to use "keyword" search fo r 
desired video seggeigs or HTML jgcfaire notes in oul 
gyndaronizied database. The proto^pe of this system wa s 
Implemented and its feaabmtv was cerSfie l 
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